The placement team was an important part of the program – Diptanil Bhowmik, Consultant Analyst at Fractal Analytics

Reading Time: 2 minutes

Given the present job market scenario, it becomes difficult for freshers to get a job even if they know what they want to pursue as a career. In such a situation it is important to figure out the right path to land your dream job and get the relevant skills required for the job. Upskilling with Great Learning was the path that Diptanil Bhowmik chose and was successful in achieving a job as consultant analyst at Fractal Analytics. Read more to know what he has to say about the Data Science program he pursued.

What did you like the most about Great Learning?

To start with, Great Learning has been a great experience due to the faculty appointed for the program. They are highly skilled and have a great depth of knowledge on the topics being taught. Secondly, the management was quite considerate.

The initial classes were easy but as the major topics came in, it needed an extra effort and the whole management was very helpful with extra classes and other guidance as required. 

The best part of the program is the capstone project that I have worked on. I got an IPL dataset to work on which was both interesting and challenging. Through this project, I got to learn how data science works.

The placement team was an important part of the program as many top-notch companies were brought in and a placement boot camp was set-up for each company.

I got an offer from Fractal Analytics as a consultant analyst which gave a great career boost to me being a fresher. The company is great to work for. 

How was your experience of the interview process at Fractal?

Fractal had one of the most rigorous placement processes. It started with an aptitude test which required us to answer 70 questions in 75 minutes. A safe score would be around 40-45.

After that, there was an SQL+Python test in Hacker Rank which was quite challenging.

I got called for the personal interview round after clearing the previous rounds. From this point, every round was an elimination round which consisted of a Technical round which was a resume drilling round, followed by a business problem round that involved case studies and guesstimate questions, and lastly an HR round.

How did GL help you with the interview process?

GL helped with all these rounds. Different companies have different requirements and GL made sure to prepare us for each interview process separately. 

The experience with GL had been excellent and I would recommend people for taking up the DSE course because of the quality of content they have and the depth they teach with. They also provide tasty food for lunch if you are a foodie.

Upskill with Great Learning’s PG Program in Data Science and Engineering and unlock your dream career.

The placement team helped us to stay motivated throughout the process – Aashish Anil Mishra, Consultant at Fractal Analytics

Reading Time: 3 minutes

We see many professionals getting disheartened by multiple rejections during the process of job search. But the few who do not get demotivated and look out for feedback and ways to improve themselves to finally crack their dream jobs, are the ones worth sharing their experience with others. Read how Aashish Anil Mishra learnt from his feedback, upskilled, and cracked a job with Fractal Analytics.  

When did you decide to upskill?

Right after completing my post-graduation as an MBA, I started looking for a job. I got interviewed for a few roles and also got selected in some of them. But I did not find job roles matching my career preferences.

Soon, I got a call for an Analyst role from Ugam. This was the role which I would have preferred, but I could not get through the interview process. I started applying for similar open positions with other analytics companies via different mediums but did not get an interview call. I then received a mail stating that the positions are filled by the candidates with comparatively better skills and experience.

So, I got to know that I need to develop my skill set with the most common requirement of the analytics market. 

Why did you choose Great Learning?

It occurred to me that I had to learn the tools which I had never heard about earlier. I started exploring the internet and went through the online courses from a few platforms like Udemy and Analytics Vidhya but was not fully convinced with the format.

I needed a classroom training program, which would help me in developing skills and career transition. I searched for options and found PGP-DSE offered by Great Learning as the best one based on reviews and career transition ratio. After completing the pre-joining requisites, i.e., the online exam and interview, I enrolled in PGP-DSE Jan’19 batch at Bangalore.

What was the role of gurus and teams in making sure you have a great experience?

Initially, I found it difficult because I had very less prior exposure to programming. But as the sessions progressed, it became easier because of classroom training, rigorous practice, take-home lab exercises, and support from the colleagues. As the batch progressed, new tools were introduced by the faculty in a manner which helped even first-timers

The gurus arranged for the different sessions to enable domain expertise, so it was quite easy to get the silliest of the doubts cleared. The structure of the program was super which helped in properly learning the things. Periodic exams helped to stay in touch with the topics already covered. There were teaching assistants who were there to help any time, which was the best part as doubt clearing process did not take time.

The program support team, the soft skill development team, and the placement team were all helpful throughout the program. Bootcamp arranged before the placements helped in many ways as it covered everything in a short period, which was kind of a revision.

How did the placement team help with scoring a job?

We were all ready for the placements [the purpose for which almost everyone joined the program]. Many companies started conducting drives, some got through the process and some had to wait for other interviews. The placement team helped us to stay motivated throughout the process. I got an opportunity to attend an interview in 3 companies before getting placed in Fractal Analytics as a Consultant.

Overall the journey was good and I would like to thank Great Learning for developing the right set of skills and helping me in the career transition. Also, I would like to thank the program support team who were always there to take up the queries and help in the best possible way for the same. I would also like to mention that you get connected to the huge alumni network of Great Learning which will be a great help in future too.

Upskill with Great Learning’s PG Program in Data Science and Engineering and unlock your dream career.

Launch pad for my Transition into Data Science: Bhavya, Data Scientist

Reading Time: 3 minutes

As rightly said by our alumnus Bhavya Labishetty, “One of the most difficult things to do in life is to restart” A career transition can be a daunting experience, but not at Great Learning. We try to make it meaningful, fun, and rewarding as possible. After the experience she has had at Great Learning, she says “Great Learning has been a Launchpad for my transition into Data Science. DSE program has been a learning experience.” Here is an overview of her journey at Great Learning.

Tell us about your professional background. Why did you choose Great Learning? 

I am a Computer Science and Engineering graduate (2016) and have worked at Hewlett Packard Enterprise (now DXC Technology) for 2 years. My experience has been into Quality Analysis and Business Analysis. I was very intrigued by Analytics and wanted to transition into the field. I did a few online courses and tried transitioning into Analytics but that was extremely difficult. After a lot of research, I decided to pursue a full-time course with placement assistance and Great Learning ticked that box.

How has your experience been at Great Learning? 

The most daunting task is to quit a job and try to transition into a field which requires a different level of skills altogether. Initially, when I joined GL, I had no idea of what to expect but along the program, I began to understand that each topic covered in class acts as a foundation and a lot of self-study and preparation is required to understand the bigger picture.

Faculty were able to provide a basic understanding of each topic and sessions were very insightful. I come from a programming background so understanding the logical flow wasn’t as difficult but since I was not involved in any coding for 2 years, keeping up with fresh graduates and people with programming experience was quite a challenge. One of the best things about the program was the kind of peers I got to interact with. Everyone had a new approach to understand the problem. I will be grateful for their support and being one of the reasons for me to get better. 

Were the projects and case studies helpful?

We were assigned mini-projects and case studies that helped understand the problem and its practical implementation. In the case of the Capstone project, we had ample time not just for implementation but also to do the groundwork on the domain that was assigned. This came in handy during interviews as I was able to explain the real reasons why data was behaving in such a fashion during the given time.

How was the career support from the team at Great Learning?

Academic and Career support given by GL was satisfactory. The team was available, a special mention to Akhila as she was always the first person to guide and help us in preparation. She provided enough feedback time and again and that was of great help. 

With the help of Career support, I got to interview with many companies like Nielsen, Bridgei2i Analytics, Oppo etc. Using the foundations from the topics and working on those concepts was necessary to crack the interviews.

How was your experience of the interview with Oppo?

The entire process was of 4 rounds. Initially, there was a Telephonic HR screening where they tried to understand my work experience and the projects mentioned in the resume. The next round included questions based on the projects mentioned in my resume and also related to the Machine Learning models used in the projects. My Capstone project was related to financial risk domain and the interviewer asked several questions on the same. Next round was a Case study round where the interviewer provided a problem statement and asked me to provide a solution within 30 minutes. The case study was again related to Finance risk domain and my research on Capstone project helped me in providing the solution. The final round was with Oppo‘s CRO and after the interview, I received a confirmation. 

Upskill with Great Learning’s PG Program in Data Science and Engineering and unlock your dream career.

What is Data Science?

Reading Time: 7 minutes

Data Science continues to be a hot topic among skilled professionals and organizations that are focusing on collecting data and drawing meaningful insights out of it to aid business growth. A lot of data is an asset to any organization, but only if it is processed efficiently. The need for storage grew multifold when we entered the age of big data. Until 2010, the major focus was towards building a state of the art infrastructure to store this valuable data, that would then be accessed and processed to draw business insights. With frameworks like Hadoop that have taken care of the storage part, the focus has now shifted towards processing this data. Let us see what is data science, and how it fits into the current state of big data and businesses. 

Broadly, Data Science can be defined as the study of data, where it comes from, what it represents, and the ways by which it can be transformed into valuable inputs and resources to create business and IT strategies. 

what is data science graphical representation(Source: datascience@berkeley)

 

Why businesses need Data Science?

We have come a long way from working with small sets of structured data to large mines of unstructured and semi-structured data coming in from various sources. The traditional BI tools fall short when it comes to processing this massive pool of unstructured data. Hence, Data Science comes with more advanced tools to work on large volumes of data coming from different types of sources such as financial logs, multimedia files, marketing forms, sensors and instruments, and text files. 

Mentioned below are relevant use-cases which are also the reasons behind Data Science becoming popular among organizations:

– Data Science has myriad applications in predictive analytics. In the specific case of weather forecasting, data is collected from satellites, radars, ships, and aircraft to build models that can forecast weather and also predict impending natural calamities with great precision. This helps in taking appropriate measures at the right time and avoid maximum possible damage. 

– Product recommendations have never been this precise with the traditional models drawing insights out of browsing history, purchase history, and basic demographic factors. With data science, vast volumes and variety of data can train models better and more effectively to show more precise recommendations.

– Data Science also aids in effective decision making. Self-driving or intelligent cars are a classic example. An intelligent vehicle collects data real-time from its surrounding through different sensors like radars, cameras, and lasers to create a visual (map) of their surroundings. Based on this data and advanced machine learning algorithm, it takes crucial driving decisions like turning, stopping, speeding etc. 

 

What is data science

 

Why you should build a career in Data Science? 

Now that we have seen why businesses need data science in the above section, let’s see why is data science a lucrative career option through this video:

 

Who is a Data Scientist?

A data scientist identifies important questions, collects relevant data from various sources, stores and organizes data, decipher useful information, and finally translates it into business solutions and communicate the findings to affect the business positively. 

Apart from building complex quantitative algorithms and synthesizing a large volume of information, the data scientists are also experienced in communication and leadership skills, which are necessary to drive measurable and tangible results to various business stakeholders. 

 

What is the prerequisite skill sets to Data Science?

Data Science is a field of study which is a confluence of mathematical expertise, strong business acumen, and technology skills. These build the foundation of Data Science and require an in-depth understanding of concepts under each domain. The three requisite skills are elaborated below:

Mathematical Expertise: There is a misconception that Data Analysis is all about statistics. There is no doubt that both classical statistics and Bayesian statistics are very crucial to Data Science, but other concepts are also crucial such as quantitative techniques and specifically linear algebra, which is the support system for many inferential techniques and machine learning algorithms. 

Strong Business Acumen: Data Scientists are the source of deriving useful information that is critical to the business, and are also responsible for sharing this knowledge with the concerned teams and individuals to be applied in business solutions. They are critically positioned to contribute to the business strategy as they have the exposure to data like no one else. Hence, data scientists should have a strong business acumen to be able to fulfil their responsibilities. 

Technology Skills: Data Scientists are required to work with complex algorithms and sophisticated tools. They are also expected to code and prototype quick solutions using one or a set of languages from SQL, Python, R, and SAS, and sometimes Java, Scala, Julia and others. Data Scientists should also be able to navigate their way through technical challenges that might arise and avoid any bottlenecks or roadblocks that might occur due to lack of technical soundness.

 

Other roles in the field of data science:

So far, we have understood what is data science, why businesses need data science, who is a data scientist, and what are the critical skill sets that are required to enter the field of data science. Now, let us look at some other data science job roles apart from that of a data scientist:

– Data Analyst: This role serves as a bridge between business analysts and data scientists. They work on specific questions and find results by organizing and analyzing the given data. They translate technical analysis to action items and communicate these results to concerned stakeholders. Along with programming and mathematical skills, they also require data wrangling and data visualization skills. 

– Data Engineer: The role of a data engineer is to manage large amounts of rapidly changing data. They manage data pipelines and infrastructure to transform and transfer data to respective data scientists to work on. They majorly work with Java, Scala, MongoDB, Cassandra DB, and Apache Hadoop. 

 

Data Science Salary trends across job roles:

what is data science

(Source: Analytics India Magazine – Salary Study 2019)

 

Who can become a data scientist/analyst/engineer?

Data Science is a multidisciplinary subject and it is a big misconception that one needs to have a PhD in science or mathematics to become a data science professional. Although a good academic background is a plus when it comes to data science profession, it is certainly not an eligibility criterion. Anyone with a basic educational background and an intellectual curiosity towards the subject matter can become a data scientist. 

 

Critical tools in Data Science Domain:

SAS – It is specifically designed for operations and is a closed source proprietary software used majorly by large organizations to analyze data. It uses the base SAS programming language which is generally used for performing statistical modelling. It also offers various statistical libraries and tools that are used by data scientists for data modelling and organising. 

Apache Spark – This tool is an improved alternative of Hadoop and functions 100 times faster than MapReduce. Spark is designed specifically to manage batch processing and stream processing. Several Machine Learning APIs in Spark help data scientists to make accurate and powerful predictions with given data. It is a highly superior tool than other big-data platforms as it can process real-time data, unlike other analytical tools which are only able to process batches of historical data.

BigML – BigML provides a standardized software using cloud computing, and a fully interactable GUI environment that could be used for processing ML algorithms across various departments of the organization. It is easy to use and allows interactive data visualizations. It also facilitates the export of visual charts to mobile or IoT devices. BigML also comes with various automation methods that aid the tuning of hyperparameter models and help in automating the workflow of reusable scripts. 

D3.js – D3.js is a javascript library that makes it possible for the user to create interactive visualizations and data analysis on their web browser with the help of its several APIs. It can make documents dynamic by allowing updates on the client-side, it actively uses the change in data to reflect visualization on the browser. 

MATLAB – It is a numerical computing environment that can process complex mathematical operations. It has a powerful graphics library to create great visualizations that help aid image and signal processing applications. It is a popular tool among data scientists as it can help with multiple problems ranging from data cleaning and analysis to much advanced deep learning problems. It can be easily integrated with enterprise applications and other embedded systems. 

Tableau – It is a Data Visualization software that helps in creating interactive visualizations with its powerful graphics. It is suited best for the industries working on business intelligence projects. Tableau can easily interface with spreadsheets, databases, and OLAP (Online Analytical Processing) cubes. It sees a great application in visualizing geographical data. 

Matplotlib – Matplotlib is developed for Python and is a plotting and visualization library used for generating graphs with the analyzed data. It is a powerful tool to plot complex graphs by putting together some simple lines of code. The most widely used module of the many matplotlib modules is the Pyplot. It is an open-source module that has a MATLAB-like interface and is a good alternative to MATLAB’s graphics modules. NASA’s data visualizations of Phoenix Spacecraft’s landing were illustrated using Matplotlib.

NLTK – It is a collection of libraries in Python called Natural Language Processing Toolkit. It helps in building the statistical models that along with several algorithms can help machines understand human language. 

Scikit-learn – It is a tool that makes complex ML algorithm simpler to use. A variety of Machine Learning features such as data pre-processing, regression, classification, clustering, etc. are supported by Scikit-learn making it easy to use complex ML algorithms. 

TensorFlow – TensorFlow is again used for Machine Learning, but more advanced algorithms such as deep learning. Due to the high processing ability of TensorFlow, it finds a variety of applications in image classification, speech recognition, drug discovery, etc. 

If you are interested in pursuing a career in Data Science, check out Great Learning’s postgraduate program in Data Science and Business Analytics. The Analytics and Data Science course from Great Learning has been ranked No.1 consistently since 2014 by Analytics India Magazine. The program provides international recognition and dual certificate from the University of Texas at Austin, McCombs School of Business and Great Lakes, India. You will get to learn from the top-ranked data science faculty along with the flexibility to learn at your time and space with the online learning and weekend classes format.

Your essential weekly guide to Data Science and Analytics – September Part I

Reading Time: 2 minutes

Data Science and Analytics are being applied across industries, varying in their scope and magnitude based on the purpose of the application. Even as we witness these technologies solving bigger problems, there are still some challenges faced while building these solutions. We explore some of those challenges in this week’s digest.  

SEBI Bets on Data Analytics, New Generation Tech to Address Market Challenges

Continuing its efforts to bolster supervision and identify non-compliance, regulator Sebi plans to deploy data analytics and new generation technologies to deal with various challenges in the market. Technology solutions are being built to achieve the objective of identifying non-compliance and assisting in investigations.

Figleaves Deploys AVORA Augmented Analytics for Granular Insights and Reporting

AVORA provides an end-to-end augmented analytics platform, utilising Machine Learning with smart altering to deliver easy to use, in-depth data analysis. By eliminating the limitations of existing analytics, reducing data preparation and discovery time by 50-80%, and accelerating time to insight to just a matter of seconds rather than days, AVORA creates game-changing organizational intelligence.

New Tools of Data Science Used to Capture Single Molecules in Action

Single-molecule fluorescence techniques have revolutionized our understanding of the dynamics of many critical molecular processes, but signals are inherently noisy and experiments require long acquisition times. This work leverages new tools from data science in order to make every photon detected count and refine our picture of molecular motion.

Challenges in Analytics Sector: The Industry Perspective

Analytics industry has witnessed significant growth over the years but is still prone to a lot of challenges in terms of talent, reaching the right consumers, cumulating data points, among others. 3 Key Challenges That Analytics Industry Still Faces Today are: 

Translating data to business impact | Multiple sources of data | Data quality

To read more about Data Science, Analytics, and their career prospects, check this space. Upskill in Data Science domain with Great Learning’s PG program in Data Science and Engineering.

Critical skill-sets to make or break a data scientist 

Reading Time: 4 minutes

Ever since data took over the corporate world, data scientists have been in demand. What further increases the attractiveness of this job is the shortage of skilled experts. Companies are willing to pour their revenue into the pockets of data scientists who have the right skills to put an organization’s data at work.

However, that does not mean it is easy for candidates to grab a job at renowned organizations. If you’ve been wanting to establish a career in data science, know that it takes the right set of skills to be considered worthy of the position.

What exactly then do you need to become an in-demand data scientist?

Here are a few valuable skills required for data scientist to inculcate before hitting the marketplace looking for your ideal job.

Programming or Software Development Skills

Data scientists need to toy with several programming languages and software packages. They need to use multiple software to extract, clean, analyze, and visualize data. Therefore, an aspiring data scientist needs to be well-versed with:

– Python – Python was not formally designed for data science. But, now that data analytics and processing libraries have been developed for Python, giants such as Facebook and Bank of America are using the language to further their data science journeys. This high-level programming language is powerful, friendly, open-source, easy to learn, and fast.

– R – R was once used exclusively for academic purposes, but a number of financial institutions, social networking services, and media outlets now use this language for statistical analysis, predictive modelling, and data visualization. This is a reason why R is important for aspiring data scientists to get their hands on.

– SQL – Structured Query Language is a special-purpose language that helps manage data in relational database systems. SQL helps you in inserting, querying, updating, deleting, and modifying data held in database systems. 

– Hadoop – This is an open-source framework that allows distributed processing of large sets of data across computer clusters using simple programming models. Hadoop offers fault tolerance, computing power, flexibility, and scalability in processing data.

Problem Solving and Risk Analysis Skills

Data scientists need to maintain exceptional problem-solving skills. Organizations hire data scientists to work on real challenges and attempt to solve them with data and analytics. This needs an appetite to solve real-world problems and cope with complex situations. 

Additionally, aspiring data scientists also need to be a master at the art of calculating the risks associated with specific business models. Since you will be responsible for designing and installing new business models, you will also be in charge of assessing the risks that entail them. 

skills required for data scientist
Summary of critical skills required for data scientists

Process Improvement Skills

Most of the data science jobs in this era of digital transformation have to deal with improving legacy processes. As organizations move closer to transformation, they need data scientists to help them replace traditional with modern.

As a data scientist, it falls upon you to find out the best solution to a business problem and improve relevant processes or optimize them. 

It makes a lot of sense for data scientists to develop a personalized approach to improving processes. If you can show your potential employer that you can enhance their current business processes, you will significantly increase your chances of landing the job.

Mathematical Skills

Unlike many high-paying jobs in computer science, data science jobs need both practical and theoretical understanding of complex mathematical subjects. Here are a few skills you need to master under this set:

– Statistics – No points for guessing this one, but statistics is and will be one of the top data science skills for you to master. This branch of mathematics deals with the collection, analysis, organization, and interpretation of data. Among the vast range of topics you might have to deal with, you’ll need a strong grasp over probability distributions, statistical features, over and undersampling, Bayesian statistics, and dimensionality reduction. 

– Multivariable calculus and linear algebra – Without these technologies, it is hard to curate the modern-day business solutions. Linear algebra happens to be the language of computer algorithms, while multivariable calculus is the same for optimization problems. As a data scientist, you will be tasked with optimizing large-scale data and defining solutions for them in terms of programming languages. Therefore, it is essential for you to have a stronghold over these concepts.

Deep Learning, Machine Learning, Artificial Intelligence Skills

Did you know, as per PayScale, the data scientists equipped with the knowledge of AI/ML get paid up to INR 20,00,000 with an average of INR 7,00,000? Modern-day businesses need their data scientists to have a basic understanding, if not expertise, over these technologies. Since these areas of technology have to do a lot with data, it makes sense for you to have a foundational understanding of these concepts.

Learning the ins and outs of these concepts will highly increase your data science skills and help you stand out from other prospective employees.

Collaborative Skills

It is highly unlikely for a data scientist to work in solitude. Most companies today house a team of data science experts who work on specific classes of problems together. Even if not in a team of data scientists, you will definitely need to collaborate with business leaders and executives, software developers, and sales strategists among others.

Therefore, when putting all of the necessary skills in perspective, do not forget to inculcate teamwork and collaborative skills. Define the right ways of bringing issues in front of people and explaining your POV without exerting dominance.

It might also help you to be able to explain data science concepts and terminologies in a simple language to non-experts.

For the year 2019, the total number of analytics and data science job positions available are 97,000, which is more than 45% as compared to the last year. Trends like this act as a magnet to attract fresh graduates towards a career in Data Science. As a data scientist, you need to wear multiple hats and ace them all. Since the field is currently expanding and evolving, it is hard to predict everything that a data scientist needs to know. However, start by working on these preliminary skills required for data scientist and then move your way up.

If you are interested in moving ahead with a career in Data Science, then you should start inculcating the above-mentioned skills to improve your employability. Upskilling with Great Learning’s PG program in Data Science Engineering will do the most of it for you!

Public Data: A Data Scientist’s dream

Reading Time: 3 minutes

We’ve all heard how data science will transform (if it hasn’t already) the business landscape, touching everything from our supermarkets to our hospitals and our airlines to our credit cards. Most companies in the areas of data science use proprietary information from millions of private transactions to gain insight into our behavior that in turn allows these companies to turn a profit. However, if you are an amateur data scientist, a hobbyist, a student or a data-minded citizen, this information is typically off limits. And a simulation just isn’t enough because it doesn’t meaningfully replicate the complexity and multi-dimensionality of this data.

Public Data Sets

How about all the publicly available information though? Now, here’s an underused treasure trove for data scientists. Concerns about the quality of data aside, open data provides unparalleled opportunities. There are typically no usage restrictions for data in the public domain, and stitching together disparate sources of data (census, crime records, traffic, air pollution, public maps, etc.) gives you the opportunity to test interactions between various data sets. Possibly the most complete list of public datasets is available at this GitHub page.

Notice I said ‘concerns about the quality of data’ in the previous paragraph? That can be a massive problem. The biggest impediment to the use of public data is the lack of reliability of data. Often, the data sets are poorly indexed or incomplete. But even more commonly, these public stores of information are stored in formats that are incompatible with data wrangling. Scanned documents and hand-written ledgers don’t lend themselves to easy analysis. So, a large part of public data projects ends up being a transcription effort. Web scraping, dimensionality reduction, imputation, bias removal and normalization are all skills that a data scientist needs to develop when working with public, open data.

Where is all this public data?

Of course, there are some extremely powerful sources of public data with somewhat clean, reliable and ready-to-use data as well. For government and public sector data, the first port of call is India’s Open Government Data Platform, which includes robust data on commodity prices, national statistics, company registers and even government budgets. Macroeconomic data is best sourced from the World Bank or from Google’s Public Data Explorer. The Public Data Explorer stitches together information from a range of sources (IMF, World Bank, OECD, University libraries and even Google’s own data collection efforts), and contains some slick, interactive visualization.  A variety of other interesting sources of data include Reserve Bank data for bank, forex and CPI information and Bhuvan, ISRO’s geo-platform for geographical data.

Recognizing just how time-intensive and complicated data cleaning and collation can be, there are some interesting companies that focus on getting you clean data sets. Not surprisingly, they focus on the most immediately lucrative sector – finance.  Quandl provides some intriguing financial data sets for free, including the renowned Big Mac price index, and all the data is designed to be easily imported and ready for use in minutes. Another company challenging the traditional (paid) data powerhouses is StockTwits. Their API allows you to get real-time data for free all day, every day. If you want historical data (going back about 3-5 years), numerous users have downloaded using StockTwits and created data sets that you can easily repurpose.

Getting competitive

If you’re the sort who likes a competitive challenge rather than tinkering with datasets by yourself, there are some wonderful competition platforms that make public datasets available with a well-defined problem statement. The first port of call is Kaggle, whose competition problems include Flu Forecasting and Automated Essay Scoring. Kaggle also comes with a set of very interesting data sets for the self-driven data scientist. Driven Data is another such platform albeit with a limited selection of competitions.

Once you’re ready to meet and work meaningfully with others interested in data-driven solutions to social problems, you can seek out global movements like DataKind. Their efforts range from weekend marathons to long-term cross-sector engagements. Earlier this year, DataKind’s Bangalore chapter created a tool to help you understand various aspects of the Union Budget for 2016-17. The source code is public and entirely open to being repurposed for use on any other data set. There are also academic paths to learning and collaboration in data science – the most prominent of which is the University of Chicago’s Data Science for Social Good fellowship.

Public datasets offer the best opportunity to learn, experiment and produce valuable analytical insights to benefit society. In a world where data is an increasingly valuable currency, these public data sets are perhaps the last bastion for the precious, complex data necessary to draw meaningful conclusions about the way we live.