‘When in doubt, go to the library’
The probability that you must have heard of ‘Python’ is outright. Guido Van Rossum’s brainchild – Python, which dates back to the 80’s has become an avid game changer. It is one of the most popular coding language today and is widely used for a gamut of applications. So let us get to the point – 21 Open Source Libraries in Python.
A library is a collection of pre-combined codes that can be used iteratively to reduce the time required to code and are particularly useful for accessing the frequently used codes instead of being written from scratch every single time. So, similarly to the physical libraries, these are a collection of reusable resources which means every library has a root source. This is the foundation behind the numerous open source libraries available in Python (One more reason to make it a favorite).
Let’s Get Started!
- Scikit- learn : It is a free software machine learning library for the Python programming language and can be effectively used for a variety of applications which include classification, regression, clustering, model selection, naive Bayes’, grade boosting, K-means and preprocessing.
- Python (>= 2.7 or >= 3.3),
- NumPy (>= 1.8.2),
- SciPy (>= 0.13.3).
Spotify uses Scikit-learn for their music recommendations and Evernote for building their classifiers. If you already have a working installation of numpy and scipy, the easiest way to install scikit-learn is using pip.
2. NuPIC : The Numenta Platform for Intelligent Computing(NuPIC) is a platform which aims to implement HTM learning algorithm and make them public source as well. It is basically the foundation to future machine learning algorithms based on the biology of neocortex. Click here to check out their code on GitHub.
3. Ramp : It is a Python library which is used for rapid prototyping of machine learning models. Ramp provides a simple, declarative syntax for exploring features, algorithms and transformations. It is a lightweight pandas-based machine learning framework and can be used seamlessly with existing python machine learning and statistics tools.
4. NumPy : When it comes to scientific computing NumPy is one of the fundamental packages for Python providing support for large multidimensional arrays, matrices along with a collection of high level mathematical functions to execute these functions swiftly. NumPy relies on BLAS and LAPACK for efficient linear algebra computations. NumPy can also be used as an efficient multi-dimensional container of generic data.
The various NumPy installation packages can be found here.
5. Pipenv : The officially recommended tool for Python in 2017 – Pipenv is a production-ready tool that aims to bring the best of all packaging worlds to the Python world. The cardinal purpose is to provide users an easy to setup working environment. Pipenv, the “Python Development Workflow for Humans” is created by Kenneth Reitz for managing package discrepancies. The instructions to install Pipenv can be found here.
6. TensorFlow : The most popular deep learning framework – It is an open source software library for high performance numerical computation. It is an iconic math library and is also used for machine learning and deep learning algorithms. Tensorflow was developed by the researchers at the Google Brain team within Google AI organisation and today is being used by researchers for machine learning algorithms to physicists for complex mathematical computations. The following operating systems support TensorFlow :macOS 10.12.6 (Sierra) or later. Ubuntu 16.04 or later. Windows 7 or above. Raspbian 9.0 or later.
7. Bob : Developed at Idiap Research Institute in Switzerland, Bob is a free signal processing and machine learning toolbox. The toolbox is written in a mix of Python and C++. From image recognition to image and video processing using machine learning algorithms; a large number of packages are available in Bob to make all of this happen with great efficiency in a short time span.
“Your library is your paradise.”
8. PyTorch : Introduced by Facebook in 2017, PyTorch is a Python package which gives the user a blend of 2 high-level features – Tensor computation (like numpy) with strong GPU acceleration and developing Deep Neural Networks on a tape-based autodiff system. PyTorch provides a great platform to execute Deep Learning models with increased flexibility and speed built to be integrated deeply with Python.
9. PyBrain : Well I won’t be surprised if you guessed it. Yes, PyBrain contains algorithms for neural networks that can be used by entry level students yet can be used for state-of-the-art research. The goal is to offer simple, flexible yet sophisticated and powerful algorithms for machine learning with many pre-determined environments to test and compare your algorithms. Researchers, students, developers, lecturers, you and me – we can all use PyBrain.
10. MILK : This machine learning toolkit in Python focuses on supervised classification with a gamut of classifiers available : SVM, k-NN, random forests, decision trees. A range of combination of these classifiers gives different classification systems. For unsupervised learning one can use k-means clustering and affinity propagation. There is a strong emphasis on speed and low memory usage. Therefore, most of the performance sensitive code is in C++. More about this here.
11. Keras : It is an open source neural network library written in Python designed to enable fast experimentation with deep neural networks. With deep learning becoming ubiquitous, Keras becomes the idle choice as it is API designed for humans and not machines according to the creators.With over 200,000 individual users as of November 2017, Keras has stronger adoption in both the industry and the research community even over TensorFlow or Theano. Before installing Keras, it is advised to install TensorFlow backend engine.
12. Dash : From exploring data to monitoring your experiments Dash is like the frontend to the analytical Python backend. This productive Python framework is ideal for data visualization apps particularly suited for every Python user. The ease which we experience is a result of extensive and exhaustive effort. If you want to learn more about how it is built you can watch this talk from Plotcon.
13. Pandas : It is an open source, BSD licensed library. Pandas enables provision of easy data structure and quicker data analysis for Python. For operations like data analysis and modelling, Pandas makes it possible to carry these out without having the need to switch to more domain specific language like R. The best way to install Pandas is by Conda installation.
‘I find treasure every time I visit a library’.
14. Scipy : This is yet another open source software used for scientific computing in Python. Apart from scientific computing, Scipy is also used for Data computation, productivity and high performance computing and quality assurance. The various installation packages can be found here. The core Scipy packages are Numpy, SciPy library, Matplotlib, IPython, Sympy and Pandas.
15. Matplotlib : All the libraries that we have discussed are capable of a gamut of numeric operations but when it comes to dimensional plotting, Matplotlib steals the show. This open source library in Python is widely used for publication of quality figures in a variety of hard copy formats and interactive environments across platforms. You can design charts, graphs, pie charts, scatterplots, histograms, errorcharts etc with just a few lines of code.
The various installation packages can be found here.
16. Theano : This open source library enables you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. For humongous volume of data, hand crafted C codes become slower. Theano enables swift implementations of code rivalling the C code. Theano can recognise unstable expressions and yet compute them with stable algorithms which gives it an upper hand over NumPy. Follow the link to read more about Theano. The closest Python package to Theano is Sympy. So let us talk about it.
17. SymPy : For all the symbolic mathematics, SymPy is the answer. This Python library for symbolic mathematics is an effective aid for computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. SimPy is written in Python only and can be embedded in other applications and extended with custom functions. You can find the source code on GitHub.
18. Caffe2 : The new boy in town – Caffe2 is a Lightweight, Modular, and Scalable Deep Learning Framework. It aims to provide an easy and straightforward way for you to experiment with deep learning. Thanks to Python and C++ API’s in Caffe2, we can create our prototype now and optimize later. Get started with Caffe2 now with this step-by-step installation guide.
‘If you have a garden and a library, you have everything you need!’
19. Seaborn : When it comes to visualisation of statistical models like heat maps,Seaborn is among the reliable sources. This Python library is derived from Matplotlib and closely integrated with Pandas data structures. Visit the installation page to see how this package can be installed
20. Hebel : This Python library is a tool for deep learning with neural networks using GPU acceleration with CUDA through pyCUDA. Right now, Hebel implements feed-forward neural networks for classification and regression on one or multiple tasks. Other models such as Autoencoder, Convolutional neural nets, and Restricted Boltzman machines are planned for the future. Follow the link to explore Hebel.
21. Chainer : A competitor to Hebel, this Python package aims at increasing the flexibility of deep learning models. The three key focus areas of chainer include :
Transportation system : The makers of Chainer have consistently shown an inclination towards automatic driving cars and they have been in talks with Toyota Motors about the same.
Manufacturing industry : From object recognition to optimization, Chainer has been used effectively for robotics and several machine learning tools.
Bio-health care : To deal with the severity of cancer, the makers of Chainer have invested in research of various medical images for early diagnosis of cancer cells.
The installation, projects and other details can be found here.
“What in the world would we do without our libraries?”
So here is a list of the common Python Libraries which are worth taking a peek at and if possible familiarizing yourself with. If you feel there is some library which deserves to be in the list do not forget to mention it in the comments.