Data Science #1

Topic- Data Science v/s Big Data v/s Data Analytics

Let begin with putting up a question what is data?

Data means raw facts which goes under processing and generates some useful information.
Data is everywhere. In fact, the amount of digital data that exists is growing at a rapid rate, doubling every two years, and changing the way we live. According to IBM, 2.5 billion gigabytes (GB) of data was generated every day in 2012.

😮 That means 51TB of information will be generated each year by each person and the more sad part is our new generation have to learn more prefix to represent the quantity of bytes.

  • 1,024 Bytes = 1 Kilobyte
  • 1,024 Kilobytes = 1 Megabyte
  • 1,024 Megabytes = 1 Gigabyte
  • 1,024 Gigabytes = 1 Terabyte
  • 1,024 Terabytes = 1 Petabyte
  • 1,024 Petabytes = 1 Exabyte (In 2000, 3 exabytes of information was created.)
  • 1,024 Exabytes = 1 Zettabyte
  • 1,024 Zettabyte = 1 Zottabyte
  • 1,024 Zottabyte = 1 Brontobyte (That is a 1 followed by 27 zeroes.)

In this article, we will differentiate between the Data Science, Big Data, and Data Analytics, based on what it is, where it is used, the skills you need to become a professional in the field, and the salary prospects in each field.


What they are

Data Science: Dealing with unstructured and structured data, Data Science is a field that comprises of everything that related to data cleansing, preparation, and analysis.

Data Science is the combination of statistics, mathematics, programming, problem solving, capturing data in ingenious ways, the ability to look at things differently, and the activity of cleansing, preparing, and aligning the data.

In simple terms, it is the umbrella of techniques used when trying to extract insights and information from data.

Big Data: Big Data refers to humongous volumes of data that cannot be processed effectively with the traditional applications that exist. The processing of Big Data begins with the raw data that isn’t aggregated and is most often impossible to store in the memory of a single computer.

A buzzword that is used to describe immense volumes of data, both unstructured and structures, Big Data inundates a business on a day-to-day basis. Big Data is something that can be used to analyze insights which can lead to better decision and strategic business moves.

The definition of Big Data, given by Gartner is, “Big data is high-volume, and high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation”.

Data Analytics: Data Analytics the science of examining raw data with the purpose of drawing conclusions about that information.
Data Analytics involves applying an algorithmic or mechanical process to derive insights. For example, running through a number of data sets to look for meaningful correlations between each other.
It is used in a number of industries to allow the organizations and companies to make better decisions as well as verify and disprove existing theories or models.The focus of Data Analytics lies in inference, which is the process of deriving conclusions that are solely based on what the researcher already knows.


The applications of each field

Applications of Data Science:

  • Internet search: Search engines make use of data science algorithms to deliver best results for search queries in fraction of seconds.
  • Digital Advertisements: The entire digital marketing spectrum uses the data science algorithms – from display banners to digital billboards. This is the mean reason for digital ads getting higher CTR than traditional advertisements.
  • Recommender systems: The recommender systems not only make it easy to find relevant products from billions of products available but also adds a lot to user experience. A lot of companies use this system to promote their products and suggestions in accordance to the user’s demands and relevance of information. The recommendations are based on the user’s previous search results.

Applications of Big Data:

  • Big Data for financial services: Credit card companies, retail banks, private wealth management advisories, insurance firms, venture finds, and institutional investment banks use big data for their financial services. The common problem among them all is the massive amounts of multi structured data living in multiple disparate systems which can be solved by big data. Thus big data is used in a number of ways like:
  • Customer analytics
  • Compliance analytics
  • Fraud analytics
  • Operational analytics
  • Big Data in communications: Gaining new subscribers, retaining customers, and expanding within current subscriber bases are top priorities for telecommunication service providers. The solutions to these challenges lie in the ability to combine and analyze the masses of customer generated data and machine generated data that is being created every day.
  • Big Data for Retail: Brick and Mortar or an online e-tailer, the answer to staying the game and being competitive is understanding the customer better to serve them. This requires the ability to analyze all the disparate data sources that companies deal with every day, including the weblogs, customer transaction data, social media, store branded credit card data, and loyalty program data.

Applications of Data Analysis:

  • Healthcare: The main challenge for hospitals with cost pressures tightens is to treat as many patients as they can efficiently, keeping in mind the improvement of quality of care. Instrument and machine data is being used increasingly to track as well as optimize patient flow, treatment, and equipment use in the hospitals. It is estimated that there will be a 1% efficiency gain that could yield more than $63 billion in the global health care savings.
  • Travel: Data analytics is able to optimize the buying experience through the mobile/ web log and the social media data analysis. Travel sights can gain insights into the customer’s desires and preferences. Products can be up-sold by correlating the current sales to the subsequent browsing increase browse-to-buy conversions via customized packages and offers. Personalized travel recommendations can also be delivered by data analytics based on social media data.
  • Gaming: Data Analytics helps in collecting data to optimize and spend within as well as across games. Game companies gain insight into the dislikes, the relationships, and the likes of the users.
  • Energy Management: Most firms are using data analytics for energy management, including smart-grid management, energy optimization, energy distribution, and building automation in utility companies. The application here is centered on the controlling and monitoring of network devices, dispatch crews, and manage service outrages. Utilities are given the ability to integrate millions of data points in the network performance and lets the engineers to use the analytics to monitor the network.

The skills you require

To become a Data Scientist:

  • Education: 88% have a Master’s Degree and 46% have PhDs
  • In-depth knowledge of SAS and/or R: For Data Science, R is generally preferred.
  • Python coding: Python is the most common coding language that is used in data science along with Java, Perl, C/C++.
  • Hadoop platform: Although not always a requirement, knowing the Hadoop platform is still preferred for the field. Having a bit of experience in Hive or Pig is also a huge selling point.
  • SQL database/coding: Though NoSQL and Hadoop have become a major part of the Data Science background, it is still preferred if you can write and execute complex queries in SQL.
  • Working with unstructured data: It is most important that a Data Scientist is able to work with unstructured data be it on social media, video feeds, or audio.

To become a Big Data professional:

  • Analytical skills: The ability to be able to make sense of the piles of data that you get. With analytical abilities, you will be able to determine which data is relevant to your solution, more like problem solving.
  • Creativity: You need to have the ability to create new methods to gather, interpret, and analyze a data strategy. This is an extremely suitable skill to possess.
  • Mathematics and statistical skills: Good, old fashioned “number crunching”. This is extremely necessary, be it in data science, data analytics, or big data.
  • Computer science: Computers are the workhorses behind every data strategy. Programmers will have a constant need to come up with algorithms to process data into insights.
  • Business skills: Big Data professionals will need to have an understanding of the business objectives that are in place, as well as the underlying processes that drive the growth of the business as well as its profit.

To become a Data Analyst:

  • Programming skills: Knowing programming languages are R and Python are extremely important for any data analyst.
  • Statistical skills and mathematics: Descriptive and inferential statistics and experimental designs are a must for data scientists.
  • Machine learning skills
  • Data wrangling skills: The ability to map raw data and convert it into another format that allows for a more convenient consumption of the data.
  • Communication and Data Visualization skills
  • Data Intuition: it is extremely important for professional to be able to think like a data analyst.



MozActivate Tour 1 – University Institute of Technology, Burdwan


Hi everyone,

Sorry for the delay guys. 🙂 So on 26th Nov, there was a MozVR camp at UIT, Burdwan. It was the very first MozVR camp in entire West Bengal. Thanks to Biraj Karmakar, who came as a guest speaker all the way from Barasat.


We started our event at almost 10:30 am with 60+ attendees. Biraj Karmakar started the session with “What is Mozilla?”, “Why should one contribute to Mozilla and other Open Source Projects”. Then he continued with A-frame, webGL, webVR and MozVR. After that I got a chance to show them how to code in codepen and how can they develop scene locally in their machine. After I ended Biraj da asked to star hackathon and said that the best three will get a VR set. After hearing this a great energy boosted among the students. Only one could able to finish up that very moment. Meanwhile during the hackathon the participants were experiencing the VR headset.img_3507

It was such a awesome event, the response from the attendees is below expected.


pikachu -IrC chat bot

Currently I’m in my 3rd year of my BE. I always wanted to make a chat bot and in this year I got an opportunity to build one.

I have been already known to python but new things are to study about how to connect and ping server and about socket module. As soon as I finished, I started coding.

This are the two modules I have used in my bot.

import socket          
import re                  #For Regular Expression

I have added a main function, and inside that we have done our Socket Programming.

#defines the socket
def main():
 global irc
 botnick = "pikachu"
 channel = "#dgplug"
 port = 6667
 server = ""
 irc = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
 #connects to the server
 irc.connect((server, port))
 #user authentication
 irc.send("USER " + botnick + " " + botnick + " " + botnick 
 + " :Hello! I am a test bot!\r\n")
 irc.send("NICK " + botnick + "\n")
 # join the channel
 irc.send("JOIN" + channel + "\n")

Check out the source code to know more about the bot. Feel free to ask anything or to file an issue.

Happy Coding.pikachu

bn_IN l10n Meetup #MozKol

Hi everyone,


On 21st, I arrived for Barasat which is approx 150km, I think that “Nothing great was ever achieved without Enthusiasm”. So,  we met on meetup space(Venue of our event). Biraj Karmakar welcomed us in a great way, he gave us the printed plan for our next two days and a great welcome dinner.

On 22nd, we started from 8’o clock, Biraj Karmakar gave a brief session on l10n style guide and told the difference between ordinary translator and a localizer. He introduced us to Pontoon, Pootle and  Mozilla Transvision. Then we get divided into two groups, one for Pootle and other for Pontoon. It was great to see a such Enthusiasm in members. Ayan Choudhury was there to help everyone wherever they got stuck and reviewing our strings. We translated 60-70% strings of various high priority projects. We ended our first day with a delicious dinner.

On 23rd, we started again from 8’o clock, after completing translation of every projects Biraj Karmakar introduced us to Bugzilla, Moztrap. He told us how can we file a bug and take a bug and start working upon it.  That’s how we ended our third day.

Subhasis Chatterjee, Pranoy Ray,  Saptarshi Mitra and Subhasis thanks for making the event productive and thanks to our Reps Biraj Karmakar and Ayan Choudhury for helping us.


Summer Training #DgpLug

I came to know about Summer Training from someone’s tweet. At the same moment, I opened Youtube and watched Kushal Das PyCon’14 Video , after that I registered myself for the Online Summer Training. Training started from 19’july, we began with mailing list and irc. Personally, I got to knew so many new things. Till now we have completed vim editor, fhs, rST and sphinx. As my semester exams are going on I am not that intense in it. But I have attended almost every session. I would like to thank Kushal for this summer training and all the mentors out there.