What is Machine Learning?

Reading Time: 12 minutes

Arthur Samuel coined the term Machine Learning in the year 1959. He was a pioneer in Artificial Intelligence and computer gaming, and defined Machine Learning as “Field of study that gives computers the capability to learn without being explicitly programmed”

In this article, firstly, we will discuss about Machine Learning in details covering different aspects, processes, and applications. Secondly, we will start with understanding the importance of Machine Learning. We will also explain the standard terms used in Machine Learning and the steps to approach an ML problem. Further, we will understand the building blocks of Machine Learning and how does it work. Moreover, we will establish why Python is the best programming language for Machine Learning. We will also list the different types of Machine Learning approaches and industrial applications. Finally, the article ends with the job prospects and career opportunities in the field of Machine Learning with salary trends across top metropolitan cities in India.

Above all, Machine Learning is the study of making machines more human-like in their behaviour and decisions by giving them the ability to learn and develop their own programs. This is done with minimum human intervention, i.e., no explicit programming. The learning process is automated and improved based on the experiences of the machines throughout the process. Good quality data is fed to the machines, and different algorithms are used to build ML models to train the machines on this data. The choice of algorithm depends on the type of data at hand, and the type of activity that needs to be automated. 

Here’s a video by explaining what is Machine Learning from the ground up.

Now you may wonder, how is it different from traditional programming? Well, in traditional programming, we would feed the input data and a well written and tested program into a machine to generate output. When it comes to machine learning, input data along with the output is fed into the machine during the learning phase, and it works out a program for itself. To understand this better, refer to the illustration below:

What is Machine Learning - Machine learning model vs traditional model

Why Machine Learning?

Machine Learning today has all the attention it needs. Machine Learning can automate many tasks, especially the ones that only humans can perform with their innate intelligence. Replicating this intelligence to machines can be achieved only with the help of machine learning. 

With the help of Machine Learning businesses can automate routine tasks. It also helps in automating and quickly create models for data analysis. Various industries depend on vast quantities of data to optimise their operations and make intelligent decisions. Machine Learning helps in creating models that can process and analyse large amounts of complex data to deliver accurate results. These models are precise and scalable and function with less turnaround time. By building such precise Machine Learning models, businesses can leverage profitable opportunities and avoid unknown risks.

Image recognition, text generation, and many other use-cases are finding applications in the real world. This is increasing the scope for machine learning experts to shine as a sought after professionals.  

How to get started with Machine Learning? 

To get started with Machine Learning, let’s take a look at some of the important terminologies used in Machine Learning:

– Model: Also known as “hypothesis”, a machine learning model is the mathematical representation of a real-world process. A machine learning algorithm along with the training data builds a machine learning model. 

– Feature: A feature is a measurable property or parameter of the data-set. 

– Feature Vector: It is a set of multiple numeric features. We use it as an input to the machine learning model for training and prediction purposes.

– Training: An algorithm takes a set of data known as “training data” as input. The learning algorithm finds patterns in the input data and trains the model for expected results (target). The output of the training process is the machine learning model.

– Prediction: Once the machine learning model is ready, it can be fed with input data to provide a predicted output. 

– Target (Label): The value that the machine learning model has to predict is called the target or label.  

– Overfitting: When a massive amount of data trains a machine learning model, it tends to learn from the noise and inaccurate data entries. Here the model fails to characterise the data correctly. 

– Underfitting: It is the scenario when the model fails to decipher the underlying trend in the input data. It destroys the accuracy of the machine learning model. In simple terms, the model or the algorithm does not fit the data well enough.

Here’s a video that describes step by step guide to approaching a Machine Learning problem with a beer and wine example:

As discussed in the above video, the seven steps of Machine Learning are:

  1. Gathering Data
  2. Preparing that data
  3. Choosing a model
  4. Training
  5. Evaluation
  6. Hyperparameter Tuning
  7. Prediction

what is machine learning - 7 steps of machine learning

It is mandatory to learn a programming language, preferably Python, along with the required analytical and mathematical knowledge. Here are the three mathematical areas that you need to brush up before jumping into solving Machine Learning problems:

  1. Linear algebra for data analysis: Scalars, Vectors, Matrices, and Tensors
  2. Mathematical Analysis: Derivatives and Gradients
  3. Probability theory and statistics
  4. Multivariate Calculus
  5. Algorithms and Complex Optimizations

How does Machine Learning work?

The three major building blocks of a Machine Learning system are the model, the parameters, and the learner.

– Model is the system which makes predictions

– The parameters are the factors which are considered by the model to make predictions

– The learner makes the adjustments in the parameters and the model to align the predictions with the actual results

Let us build on the beer and wine example from above to understand how machine learning works. A machine learning model here has to predict if a drink is beer or wine. The parameters selected are the colour of the drink and the alcohol percentage. The first step is:

Learning from the training set – This involves taking a sample data set of several drinks for which the colour and alcohol percentage is specified. Now, we have to define the description of each classification, that is wine and beer, in terms of the value of parameters for each type. The model can use the description to decide if a new drink is wine or beer.

You can represent the values of the parameters, ‘colour’ and ‘alcohol percentages’ as ‘x’ and ‘y’ respectively. Then (x,y) defines the parameters of each drink in the training data. This set of data is called a training set. These values, when plotted on a graph, present a hypothesis in the form of a line, a rectangle, or a polynomial that fits best to the desired results.

Second step is to measure error – Once the model is trained on a defined training set, it needs to be checked for discrepancies and errors. We use a fresh set of data to accomplish this task. The outcome of this test would be one of these four:

  1. True Positive: When the model predicts the condition when it is present
  2. True Negative: When the model does not predict a condition when it is absent
  3. False Positive: When the model predicts a condition when it is absent
  4. False Negative: When the model does not predict a condition when it is present

The sum of FP and FN is the total error in the model.

Manage Noise – For the sake of simplicity, we have considered only two parameters to approach a machine learning problem here that are the colour and alcohol percentage. But in reality, you will have to consider hundreds of parameters and a broad set of learning data to solve a machine learning problem. The hypothesis then created will have a lot more errors because of the noise. Noise is the unwanted anomalies that disguise the underlying relationship in the data set and weakens the learning process. Various reasons for this noise to occur are: 

– Large training data set

– Errors in input data

– Data labelling errors 

– Unobservable attributes that might affect the classification but are not considered in the training set due to lack of data

You can accept a certain degree of training error due to noise to keep the hypothesis as simple as possible. 

Testing and Generalization: While it is possible for an algorithm or hypothesis to fit well to a training set, it might fail when applied to another set of data outside of the training set. Therefore, It is essential to figure out if the algorithm is fit for new data. Testing it with a set of new data is the way to judge this. Also, generalisation refers to how well the model predicts outcomes for a new set of data.

When we fit a hypothesis algorithm for maximum possible simplicity, it might have less error for the training data, but might have more significant error while processing new data. We call this is underfitting. On the other hand, if the hypothesis is too complicated to accommodate the best fit to the training result, it might not generalise well. This is the case of over-fitting. In either case, the results are fed back to train the model further.

what is machine learning process

Which Language is Best for Machine Learning?

Python is hands down the best programming language for Machine Learning applications due to the various benefits mentioned in the section below. Other programming languages that could to use for Machine Learning Applications are R, C++, JavaScript, Java, C#, Julia, Shell, TypeScript, and Scala.

Python is famous for its readability and relatively lower complexity as compared to other programming languages. Machine Learning applications involve complex concepts like calculus and linear algebra which take a lot of effort and time to implement. Python helps in reducing this burden with quick implementation for the ML engineer to validate an idea.

Another benefit of using Python in Machine Learning is the pre-built libraries. There are different packages for different type of applications, as mentioned below:

– Numpy, OpenCV, and Scikit are used when working with images

– NLTK along with Numpy and Scikit again when working with text

– Librosa for audio applications

– Matplotlib, Seaborn, and Scikit for data representation

– Tensorflow and Pytorch for deep learning applications

– Scipy for Scientific Computing

– Django for integrating web applications

– Pandas for high-level data structures and analysis

what is machine learning - Python libraries for Machine Learning

Python provides flexibility in choosing between object-oriented programming or scripting. There is also no need to recompile the code; developers can implement any changes and instantly see the results. You can use Python along with other languages to achieve the desired functionality and results.

Python is a versatile programming language and can run on any platform, including Windows, MacOS, Linux, Unix, and others. While migrating from one platform to another, the code needs some minor adaptations and changes, and it is ready to work on the new platform.

Here is a summary of the benefits of using Python for Machine Learning problems:

what is machine learning - why python in Machine Learning

Types of Machine Learning

In this section, we will learn about the different approaches towards machine learning and the variety of problems they can solve. 

Supervised Learning

The supervised learning model has a set of input variables (x), and an output variable (y). An algorithm identifies the mapping function between the input and output variables. The relationship is y = f(x).

The learning is monitored or supervised in the sense that we already know the output and the algorithm are corrected each time to optimise its results. The algorithm is trained over the data set and amended until it achieves an acceptable level of performance.

We can group the supervised learning problems as:

  1. Regression problems – Used to predict future values and the model is trained with the historical data. E.g., Predicting the future price of a product.
  2. Classification problems – Various labels train the algorithm to identify items within a specific category. E.g., Disease or no disease, Apple or an orange, Beer or wine.

Unsupervised Learning:

This approach is the one where the output is unknown, and we have only the input variable at hand. The algorithm learns by itself and discovers an impressive structure in the data. 

The goal is to decipher the underlying distribution in the data to gain more knowledge about the data. 

We can group the unsupervised learning problems as:

  1. Clustering: This means bundling the input variables with same characteristics together. E.g., grouping users based on search history
  2. Association: Here, we discover the rules that govern meaningful associations among the data set. E.g., People who watch ‘X’ will also watch ‘Y.’

 

Semi-supervised Learning:

In semi-supervised learning, data scientists train model with a minimal amount of labelled data and a large amount of unlabelled data. Usually, the first step is to cluster similar data with the help of an unsupervised machine learning algorithm. The next step is to label the unlabelled data using the characteristics of the limited labelled data available. After labelling the complete data, one can use supervised learning algorithms to solve the problem.

 

Reinforcement Learning:

In this approach, machine learning models are trained to make a series of decisions based on the rewards and feedback they receive for their actions. The machine learns to achieve a goal in complex and uncertain situations and is rewarded each time it achieves it during the learning period. 

Reinforcement learning is different from supervised learning in the sense that there is no answer available, so the reinforcement agent decides the steps to perform a task. The machine learns from its own experiences when there is no training data set present.

 

 

Machine Learning Applications

Machine Learning algorithms help in building intelligent systems that can learn from their past experiences and historical data to give accurate results. Many industries are thus applying machine learning solutions to their business problems, or to create new and better products and services. Healthcare, defence, financial services, marketing, and security services, among others, use Machine Learning in their applications and processes. 

Facial recognition/Image recognition: The most common application of machine learning is Facial Recognition, and the simplest example of this application is the iPhone X. There are a lot of use-cases of facial recognition, mostly for security purposes like identifying criminals, searching for missing individuals, aid forensic investigations, etc. Intelligent marketing, diagnose diseases, track attendance in schools, are some other uses.

Automatic Speech Recognition: Abbreviated as ASR, automatic speech recognition is used to convert speech into digital text. Its applications lie in authenticating users based on their voice and performing tasks based on the human voice inputs. Speech patterns and vocabulary are fed into the system to train the model. Presently ASR systems find a wide variety of applications in the following domains:

– Medical Assistance

– Industrial Robotics

– Forensic and Law enforcement

– Defence & Aviation

– Telecommunications Industry

– Home Automation and Security Access Control

– I.T. and Consumer Electronics

Financial Services – Machine learning has many use cases in Financial Services. Machine Learning algorithms prove to be excellent at detecting frauds by monitoring activities of each user and assess that if an attempted activity is typical of that user or not.

Financial monitoring to detect money laundering activities is also an critical security use case of machine learning.

Machine Learning also helps in making better trading decisions with the help of algorithms that can analyse thousands of data sources simultaneously. Credit scoring and underwriting are some of the other applications.

The most common application in our day to day activities is the virtual personal assistants like Siri and Alexa.

Marketing and Sales – Machine Learning is improving lead scoring algorithms by including various parameters such as website visits, emails opened, downloads, and clicks to score each lead.

It also helps businesses to improve their dynamic pricing models by using regression techniques to make predictions. 

Sentiment Analysis is another essential application to gauge consumer response to a specific product or a marketing initiative. 

Machine Learning for Computer Vision helps brands identify their products in images and videos online. These brands also use computer vision to measure the mentions that miss out on any relevant text. 

Chatbots are also becoming more responsive and intelligent with the help of machine learning.

Healthcare – A vital application of Machine Learning is in the diagnosis of diseases and ailments, which are otherwise difficult to diagnose. Radiotherapy is also becoming better with Machine Learning taking over. 

Early-stage drug discovery is another crucial application which involves technologies such as precision medicine and next-generation sequencing. 

Clinical trials cost a lot of time and money to complete and deliver results. Applying Machine Learning based predictive analytics could improve on these factors and give better results. 

Machine Learning technologies are also critical to make outbreak predictions. Scientists around the world are using these technologies to predict epidemic outbreaks. 

Recommendation Systems – Many businesses today use recommendation systems to effectively communicate with the users on their site. It can recommend relevant products, movies, web-series, songs, and much more. 

Most prominent use-cases of recommendation systems are e-commerce sites like Amazon, Flipkart, and many others, along with Spotify, Netflix, and other web-streaming channels.

 

Machine Learning Jobs and Career prospects:

Firstly, let us have a look at the skill sets that are necessary to become a successful machine learning professional. Then we will move on to Machine Learning job roles and career prospects.

The prerequisites to learn Machine Learning are:

– Linear Algebra

– Statistics and Probability

– Calculus

– Graph theory

– Programming Skills – Python, R, MATLAB, C++ or Octave

Essential Machine Learning skills to become a successful ML professional are:

Machine Learning Algorithms and Libraries – There is an absolute need to be acquainted with the implementation of ML algorithms mostly available through APIs, Packages, and Libraries. It is also essential to learn about the pros and cons of different applicable approaches towards ML implementation. 

Data Modelling and Evaluation – This includes the process of continuously evaluating the performance of the given model. One can achieve this by selecting an appropriate accuracy measure and an effective evaluation strategy based on the problem at hand. 

Distributed Computing – Machine Learning jobs require to be working with a great set of data. Using a single machine cannot process this massive amount of data. One needs to distribute it across a cluster of machines. 

Software engineering and system design – A strong base in software engineering and system design is a requisite for a successful machine learning career. Employers prefer the ability to build appropriate interfaces for components. These skills are valuable for improving quality, productivity, collaborations, and maintainability.     

 

Machine Learning Job Roles and salary trends: 

What is machine learning - machine-learning-job-roles

 

What is machine learning - machine-learning-salary-trends

(Source: Analytics India Magazine ‘Salary Study – 2018′)

 

The future scope of Machine Learning

To conclude, let us see how the future will turn up for Machine Learning. As per estimates, the Machine Learning market will grow to reach USD 8.81 billion by the year 2022. That means that there is going to be a substantial requirement of skills around Machine Learning to drive this growth. The future looks promising for those planning a career in Machine Learning!

If you want to know more about what is machine learning and are interested in pursuing a career in Machine Learning, check out Great Learning’s postgraduate program in Machine Learning.