A Beginners Guide to Data Science

It is no surprise that data science is THE future of technology and is creating millions of jobs world wide. Tech giants like Facebook, Google, IBM are spending millions of dollars in research and development of different aspects of Data Science like Machine Learning and Artificial Intelligence. It is also one of the most sought after job on job searching websites like Linkedin, Glassdoor and Monster. And if you are wondering what skills a data scientist requires, then you’re at the right place.

To begin with, let’s talk about what is Data Science?

As the name suggests, Data science deals with ‘data’, large amounts of data. What do we do with this data, well, this data is grouped, classified and structured and then useful insights are drawn from it that help the business development. Reading this data though in theory may sound simple, it’s actually not. That’s where the ‘science’ comes into the picture. In order to read the data, many tools and algorithms have to be used to visualize, structure and then read and derive insights.  

Data science is used as a rather broader generic term these days, when people use the word Data science they don’t mean the textbook definition of Data Science but rather all the different fields that come under Data Science, like, Data Analytics, Business Analytics, Machine Learning and Artificial Intelligence.

Each field is unique in it’s own way and perform their own tasks and functions.

Data science flow-chart

dse
Data science flow chart

This chart shows the flow in Data science, right from obtaining the data to predicting the insights, along with all the skills and tools required for that particular stage of the flow-chart.

  1. Data collection
  2. Data wrangling
  3. Data exploration
  4. Data modelling
  5. Report

Step 1:

Obtaining the Data

This is obviously one of the first and foremost steps is collecting the data, first you need to identify what kind of data you want to analyse, and then you need to export this to an exel or csv file. The next step would be to make this data easily readable, basically, it should be labelled and structured the right way so that it is easy to analyse.

Skills and tools required

  • Database management : SQL
  • Understanding the database and what it represents
  • Retrieving raw unstructured data in the form of text, docs, photos, videos etc.
  • Distributed storage : hadoop, spark, or apache

Step 2:

Scrubbing or cleaning the data

This is an important step because before you are able to read the data, you must make sure it is in a perfectly readable state, without any mistakes, no missing values or wrong values, and the data has to be consistent throughout, because the data is the most important part in this field.

Skills and tools required

  • Scripting language – Python, R, SAS
  • Data wrangling tools – Python Pandas, R
  • Distributed processing – Hadoop, Mapreduce/spark

Step 3:

Exploratory Data Analytics

Now that your data is clean and readable, it’s time to get to the real work. Analyzing the data. This is done by visualizing the data in various ways and identifying patterns and to spot anything out of the ordinary. In order to be able to analyse the data you must have an eye or attention to detail and must be able to think out of the box to identify anything out of place. And then based on this analysis, come with solutions. In short this is what a Data Analyst does.

Skills and tools required

  • Python libraries – Numpy, Matplotlib, Pandas, Scipy
  • R libraries  – GGplot2, Dplyr
  • Inferential statistics
  • Data visualization
  • Experimental design

Step 4:

Modeling or Machine Learning

Machine Learning is an application of Artificial Intelligence, in which, a machine can follow commands and rules (algorithms) and come with a predictive solutions all without any human supervision.

The engineer or scientist writes down a set of instructions for the Machine Learning algorithm to follow based on the data that has to be analysed and come up with the right output after learning through the data and instructions.

After cleaning up the data and finding out essential features through the data exploration phase, using a statistical model as a predictive tool will enhance your overall decision making

Skills and tools required

  • Machine learning – supervised, unsupervised and reinforcement machine learning
  • Evaluation methods
  • Machine learning libraries – Python (sci-kit learn) / R (CARET)
  • Linear algebra and multivariate calculus

Step 5:

Interpreting or ‘data storytelling’

This is the final step, in which you uncover your finding to your boss or company, the most important step in this would be your ability to explain your results.

You must be able to explain this to anyone with a non technical background. Hence the term ‘storytelling’.

In order to understand how the data can affect the business or how your solution helps to provide better business solutions, you must also have a understanding of the business domain.

Skills and tools required

  • Knowledge of your business domain
  • Data visualization tools – tableau, GGplot, Seaborn etc.
  • Communication – presentation skills, both verbal and written

This marks the end of the Data Science flow-chart. Now that you know what skills and tools you need to know in order to become a data scientist, you can now start to learn all these tools and enter into the vast field yourself.

You can start your learning journey and get more familiar with Greatlearning for Life, which is a free platform that consists of courses especially tailored for people with no history or knowledge in this field of data science,

Click here to start your journey!

Read further :

Applications of Data Science in the e-commerce industry

 

Leave a Reply

avatar
  Subscribe  
Notify of