This blog is custom tailored to aid your understanding on different types of commonly used neural networks, how they work and their industry applications. The blog commences with a brief introduction on the working of neural networks. We have tried to keep it very simple yet effective.
A Quick Introduction to Neural Networks
Neural networks represent deep learning using artificial intelligence. Certain application scenarios are too heavy or out of scope for traditional machine learning algorithms to handle. As they are commonly known, Neural Nets, pitches in such scenarios and fills the gap. Artificial neural networks are inspired from the biological neurons within the human body which activate under certain circumstances resulting in a related action performed by the body in response. Artificial neural nets consist of various layers of interconnected artificial neurons powered by activation functions which help in switching them ON/OFF. Like traditional machine algorithms, here too, there are certain values that neural nets learn in the training phase. Briefly, each neuron receives a multiplied version of inputs and random weights which is then added with static bias value [unique to each neuron layer], this is then passed to an appropriate activation function which decides the final value to be given out of neuron. There are various activation functions available as per the nature of input values. Once the output is generated from the final neural net layer, loss function [input vs output] is calculated and back propagation is performed where the weights are adjusted to make the loss minimum. Finding optimal values of weights is what the overall operation is focuses around. Please refer to the following for better understanding-
Input layer represents dimensions of the input vector.
Hidden layer represents the intermediary nodes that divide the input space into regions with (soft) boundaries. It takes in a set of weighted input and produces output through an activation function.
Output layer represents the output of the neural network.
Weights are numeric values which are multiplied with inputs. In back propagation they are modified to reduce the loss. In simple words, weights are machine learnt values from Neural Networks. They self-adjust depending on the difference between predicted outputs vs training inputs.
Activation Function is a mathematical formula which helps the neuron to switch ON/OFF.
There are many types of neural nets available or that might be in the development stage. They can be classified depending on their: Structure, Data flow, Neurons used and their density, Layers and their depth activation filters etc. Keeping this in mind, we are going to discuss following neural nets:
7 Types of Neural Networks
- Feed Forward Neural Nets
- Multiple Layered Perceptron Neural Nets
- Convolution Neural Nets
- Radial Basis Function Neural Nets
- Recurrent Neural Nets
- Sequence to Sequence models
- Modular Neural Network
A. Feed Forward Neural Networks
a. Simple classification [where traditional ML based classification algorithms have limitations]
b. Face recognition [Simple straight forward image processing]
c. Computer vision [Where target classes are difficult to classify]
d. Speech recognition
Simplest form of neural nets where input data travels in one direction only, passing through artificial neural nodes and exiting through output nodes. Where hidden layers may or may not be present, input and output layers are present there. Based on this, they can be further classified as single layered or multi-layered feed forward neural nets. Number of layers depends on the complexity of the function. It has uni-directional forward propagation but no backward propagation. Weights are static here. Activation function is fed by inputs which are multiplied by weights. To do so, classifying activation function or step activation function is used. For example: The neuron is activated if it is above threshold (usually 0) and the neuron produces 1 as an output. The neuron is not activated if it is below threshold (usually 0) which is considered as -1. They are fairly simple to maintain and are equipped with to deal with data which contains a lot of noise.
- Less complex, easy to design & maintain
- Fast and speedy [One-way propagation]
- Highly responsive to noisy data
- Cannot be used for deep learning [due to absence of dense layers and back propagation]
B. Multi-Layer Perceptron
a. Speech Recognition
b. Machine Translation
c. Complex Classification
An entry point towards complex neural nets where input data travels through various layers of artificial neurons. Every single node is connected to all neurons in the next layer which makes it a fully connected neural network. Input and output layers are present having multiple hidden Layers i.e. at least three or more layers in total. It has a bi-directional propagation i.e. forward propagation and backward propagation. Inputs are multiplied with weights and fed to the activation function and in back propagation they are modified to reduce the loss. In simple words, weights are machine learnt values from Neural Networks. They self-adjust depending on the difference between predicted outputs vs training inputs. Nonlinear activation functions are used followed by softmax as an output layer activation function.
- Used for deep learning [due to the presence of dense fully connected layers and back propagation]
- Comparatively complex to design and maintain
- Comparatively slow [depends on number of hidden layers]
C. Convolution Neural Network
a. Image processing
b. Computer Vision
c. Speech Recognition
d. Machine translation
Convolution neural nets contains a three-dimensional arrangement of neurons, instead of the standard two-dimensional array. The first layer is called a convolutional layer. Each neuron in the convolutional layer only processes the information from a small part of the visual field. Input features are taken in batch wise like a filter. Network understands the images in parts and can compute these operations multiple times to complete the full image processing. Processing involves conversion of the image from RGB or HSI scale to gray-scale. Furthering the changes in the pixel value will help detecting the edges and images can be classified into different categories.
Propagation is uni-directional where CNN contains one or more convolutional layers followed by pooling and bidirectional where the output of convolution layer goes to a fully connected neural network for classifying the images as shown in the above diagram. Filters are used to extract certain parts of the image. In MLP the inputs are multiplied with weights and fed to the activation function. Convolution uses RELU and MLP uses nonlinear activation function followed by softmax. Convolution neural networks show very effective results in image and video recognition, semantic parsing and paraphrase detection.
- Used for deep learning with few parameters
- Less parameters to learn as compared to fully connected layer
1. Comparatively complex to design and maintain
2. Comparatively slow [depends on the number of hidden layers]
D. Radial Basis Function Neural Networks
Radial Basis Function Network consists of an input vector followed by a layer of RBF neurons and an output layer with one node per category. Classification is performed by measuring the input’s similarity to data points from the training set where each neuron stores a prototype. This will be one of the examples from the training set. When a new input vector [the n-dimensional vector that you are trying to classify] needs to be classified, each neuron calculates the Euclidean distance between the input and its prototype. For example, if we have two classes i.e. class A and Class B, then the new input to be classified is more close to class A prototypes than the class B prototypes. Hence, it could be tagged or classified as class A. Each RBF neuron compares the input vector to its prototype and outputs a value ranging which is a measure of similarity from 0 to 1. As the input equals to the prototype, the output of that RBF neuron will be 1 and with the distance grows between the input and prototype the response falls off exponentially towards 0. The curve generated out of neuron’s response tends towards a typical bell curve. The output layer consists of a set of neurons [one per category].
Application: Power Restoration
a. Powercut P1 needs to be restored first
b. Powercut P3 needs to be restored next, as it impacts more houses
c. Powercut P2 should be fixed last as it impacts only one house
E. Recurrent Neural Networks
a. Text processing like auto suggest, grammar checks, etc.
b. Text to speech processing
c. Image tagger
d. Sentiment Analysis
Designed to save the output of a layer, Recurrent Neural Network is fed back to the input to help in predicting the outcome of the layer. The first layer is typically a feed forward neural network followed by recurrent neural network layer where some information it had in the previous time-step is remembered by a memory function. Forward propagation is implemented in this case. It stores information required for it’s future use. If the prediction is wrong, the learning rate is employed to make small changes. Hence, making it gradually increase towards making the right prediction during the back propagation.
- Model sequential data where each sample can be assumed to be dependent on historical ones is one of the advantage.
- Used with convolution layers to extend the pixel effectiveness.
- Gradient vanishing and exploding problems
- Training recurrent neural nets could be a difficult task
- Difficult to process long sequential data using ReLU as an activation function.
F. Sequence to sequence models
A sequence to sequence model consists of two Recurrent Neural Networks. Here, there exists an encoder that processes the input and a decoder that processes the output. The encoder and decoder work simultaneously – either using the same parameter or different ones. This model, on contrary to the actual RNN, is particularly applicable in those cases where the length of the input data is equal to the length of the output data. While they possess similar benefits and limitations of the RNN, these models are usually applied mainly in chatbots, machine translations, and question answering systems.
G. Modular Neural Network
- Stock market prediction systems
- Adaptive MNN for character recognitions
- Compression of high level input data
A modular neural network has a number of different networks that function independently and perform sub-tasks. The different networks do not really interact with or signal each other during the computation process. They work independently towards achieving the output.
As a result, a large and complex computational process are done significantly faster by breaking it down into independent components. The computation speed increases because the networks are not interacting with or even connected to each other.
- Independent training
- Moving target problems