How to set up your deep learning workstation: the most comprehensive guide

Set up deel learning workstation

This article contains a step by step detailed guideline to set up a deep learning workstation with Ubuntu 20.04. This is actually a documentation of the process I followed for the same in my computer. I repeated this process a no. of times. And every time I thought I should have documented the process. Proper documentation helps a quick and error-free set up in the next instance.

I have mentioned the most common mistakes and errors during the process and how to avoid or troubleshoot them. Bookmarking this page can help you quickly refer it whenever you get stuck in any of the steps.

I have done this complete setup process a few times in both of my old and new laptops with completely different configurations. So, hope that the problems I faced are the most common one. It took me a considerable time to fix all those issues, mainly by visiting different discussion groups like StackOverflow, Ubuntu discussion forum and many other discussion threads and blogs.

I compiled them in one place here. So that you don’t have to visit multiple sites and refer to this post only to complete the whole installation process. In this way, this documentation will save a lot of your valuable time.

Prerequisites to set up deep learning workstation

I assume that you already have Ubuntu on your computer. If not then please install the latest version of Ubuntu. This is the most famous open-source Linux distribution and available for free download here. Although it is possible to run deep learning Keras models on Windows, it is not recommended.

Why should you use Ubuntu for deep learning? Refer to this article

Another prerequisite for running deep learning models is a good quality GPU. I will advise you to have an NVIDIA GPU in your computer for satisfactory performance. It is a necessary condition not must though. Because running sequence processing using recurrent neural network and image processing through convolutional neural models in CPU is a difficult proposition.

Such models may take hours to give results when run with CPU. Whereas a modern NVIDIA GPU will take merely 5-10 minutes to complete the models. In case if you are not interested to invest for GPU an alternative is using cloud service for computing paying hourly rent.

However, in long run, this using this service may cost you more than upgrading your local system. So, my suggestion will be if you are serious about deep learning and wish to continue with even moderate use, go for a good workstation set up.

The main steps to set up a deep learning workstation

Now I assume that you have already completed with all the prerequisites to set up your deep learning experiments. It is a little time-consuming process. You will require a stable internet connection to download various files. Depending on the internet speed the complete process may take 2-3 hours (with an internet speed of 1gbps in my case it took 2 hours) to complete. The main steps to set up a deep learning workstation are as follow:

  • Updating the Linux system packages
  • Installation of Python pip command. It is the very basic command going to be used to install other components
  • Installing the Basic Linear Algebra Subprogram (BLAS) library required for mathematical operation.
  • HDF5 data frame installation to store hierarchical data
  • Installation of Graphviz to visualize Keras model
  • CUDA and cuDNN NVIDIA graphics drivers installation
  • Installation of TensorFlow as the backend of Keras
  • Keras installation
  • Installation of Theano (optional)

So, we will now proceed with the step by step installation process

Updating the Linux system packages

The following line of commands will complete the process of Linux system up-gradation process. You have to type the commands in Ubuntu terminal. The keyboard shortcut to open the terminal is “Ctrl+Alt+T”. Open the terminal and execute the following lines of code.

$ sudo apt-get update
$ sudo apt-get --assume-yes upgrade

Installing the Python-pip command

The pip command is for installing and managing Python packages. Next which ever packages we are going to install, this pip command will be used. It is an replacement of the earlier command easy_install. Run the following command to install python-pip.

$ sudo apt-get install python-pip python-dev

It should install pip in your computer. But sometimes there may be exceptions. As it happened to me also. See the below screenshot of my Ubuntu terminal. It says “Unable to locate package python-pip”.

It created a big problem as I was clueless about why it is happening. In my old computer, I have used it no. of times without any issue. After scouring the internet for several hours I got the solution. This has to do with the Python version installed in your computer.

If you are also facing the problem (most likely if using a new computer) then first check the python version with this command.

$ ls /bin/python*

If it returns python version 2 (for example python 2.7) then use python2-pip command or if it returns higher version python like python 3.8 then use python3-pip command to install pip. So, now the command will be as below

$ sudo apt-get install python3-pip

Ubuntu by default uses Python 2 while updating its packages. In case you want to use Python 3 then it needs to be explicitly mentioned. Only Python means Python 2 for Ubuntu. So, to change the Python version, use the following code.

# Installing Python3
$ sudo apt-get install python3-pip python3-dev

Installation steps for Python scientific suit in Ubuntu

Here the process discussed are for Windows and Linux Operating systems. For the Mac users they need to install the Python scientific suit via Anaconda. They can install it from the Anaconda repository. It is continuously updated document. The documentation provided in Anaconda is very vivid one with every step in detail.

Installation of the BLAS library

The Basic Liner Algebra Subprogram (BLAS) installation is the first step in setting up your deep learning workstation. But one thing Mac users should keep in mind that this installation does not include Graphviz and HDF5 and they have to install them separately.

Here we will install OpenBLAS using the following command.

$ sudo apt-get install build-essential cmake git unzip \
pkg-config libopenblas-dev liblapack-dev

Installation of Python basic libraries

In the next step, we will need to install the basic Python libraries like NumPy, Panda, PMatplotlib, SciPy etc. These are core Python libraries required for any kind of mathematical operations. So, be it machine learning or deep learning or any kind of computation intensive task, we will need these libraries.

So use the following command in Ubuntu terminal to install all these scientific suite simultaneously.

# installation of Python basic libraries
$ sudo apt-get install python-panda python-numpy python-scipy python- matplotlib python-yaml

Installation of HDF5

The Hierarchical Data Format (HDF) version 5 is an open-source file format which supports large, complex and heterogeneous data sources. It was developed by NASA to store large numeric data files in efficient binary formats. It has been created on the other two hierarchical data formats like HDF4 and NetCDF.

HDF5 data format allows the developer to organize his machine learning/deep learning data in a file directory structure very similar to what we use in any computer. This directory structure can be used to maintain the hierarchy of the data.

If we consider the directory nomenclature in the computer filing system, then the “directory” or “folder” is the “group” and the “files” are the “dataset” in case of HDF5 data format. It has importance in deep learning in order to save and fetch the Keras model from the disc.

Run the following command to install HDF5 in your machine

# Install HDF5 data format to save the Keras models
$ sudo apt-get install libhdf5-serial-dev python-h5py

Installation of modules to visualize Keras model

In the next step we will install two packages called Graphviz and pydot-ng. These two packages are necessary to visualize the Keras model. The codes for installing these two packages are as follow:

# Install graphviz
$ sudo apt-get install graphviz
# Install pydot-ng
$ sudo pip install pydot-ng

These two packages will definitely help you in the execution of the deep learning models you created. But for the time being, you can skip their installation and proceed with the GPU configuration part. Keras can also function without these two packages.

Installation of opencv package

Use the following code to install opencv package

# Install opencv
$ sudo apt-get install python-opencv

Setting up GPU for deep learning

Here comes the most important part. As you know that GPU plays an important role in deep learning modelling. In this section, we are going to set up the GPU support by installing two components namely CUDA and cuDNN. But to function properly they need NVIDIA GPU.

Although you can run your Keras model even in the CPU, it will take much longer time to train a model to compare to the time taken by GPU. So, my advice will be if you are serious about deep learning modelling, then plan to procure an NVIDIA GPU (using cloud service paying hourly rent is also an alternative).

Lets concentrate on the setting up of GPU assuming that your computer already have latest one.

CUDA installation

To install CUDA visit NIVIDIA download page following this link https://developer.nvidia.com/cuda-downloads. You will land in the following page. It will ask for selecting the OS you are using. As we are using Ubuntu here (to know why to use Ubuntu as the preferred OS read this article) so click Ubuntu.

CUDA installation-OS selection
CUDA installation-OS selection

Then it will ask other specifications of your workstation environment. Select them as per your existing specifications. Like here I have selected OS as Linux. I am using a Dell Latitude 3400 laptop which is a 64 bit computer, so in next option I selected x86_64; the Linux distribution is Ubuntu version 20.04.

Finally the installer type you have to select. Here I have selected the network installer mainly because it has comparatively smaller download size. I am using my mobile internet for the time being. So, it was the best option for me. But you can choose any of the other local installation options if there is no constrain of internet bandwidth. The plus point of local installation is you have to do this only once.

CUDA installation-specification selection
CUDA installation-specification selection

As all the specifications are mentioned, NVIDIA will provide you the installer. Copy the code from there and run in Ubuntu terminal. It will use Ubuntu’s apt to install the packages, which is the most easiest way to install CUDA.

CUDA installation code
CUDA installation code
$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
$ sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
$ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
$ sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
$ sudo apt-get update
$ sudo apt-get -y install cuda

Install cuDNN

“cuDNN is a powerful library for Machine Learning. It has been developed to help developers like yourself to accelerate the next generation of world changing applications.”

NVDIA.com

To download the specific cuDNN file for your operating system and linux distribution you have to visit the NIVIDIA download page.

Downloading cuDNN
Downloading cuDNN

To download the library, you have to create an account with NVIDIA. It is a compulsory step.

NVIDIA membership for Downloading cuDNN
NVIDIA membership for Downloading cuDNN

Fill in the necessary fields.

NVIDIA membership for Downloading cuDNN

As you finish registration a window with some optional settings will appear. You can skip them and proceed for the next step.

NVIDIA membership for Downloading cuDNN

A short survey by NIVIDIA is the next step. Although it is on the experience as developer, you can fill it with any of the options just to navigate to the download page.

Download survey for cuDNN
Download survey for cuDNN

Now the page with several download options will appear and you have to choose according to your specifications. I have selected the following debian file for my workstation.

Selecting the OS for cuDNN download
Selecting the OS for cuDNN download

Download the file (the file size is around 300mb in my case). Now to install the library, first change the directory to enter in the download folder and execute the install command.

Once you are in the directory where the library has been downloaded (by default it is the download folder of your computer) run the command below. Use the filename in place of **** in the command.

$ sudo dpkg -i dpkg -i ******.deb

You can follow the installation process from this page. With this the cuDNN installation is completed.

Installation of TensorFlow

The next step is installation of TensorFlow. It is very simple. Just execute the below command to install TensorFlow without GPU support using the pip command.


# Installing TensorFlow using pip3 command for Python3
$ sudo pip3 install tensorflow

Installing Keras

This is the final step of setting up your deep learning workstation and you are good to go. You can run the simple below command.

$ sudo pip3 install keras

Or you can install it from Github too. The benefits of installing Keras from Github are that you will get lots of example codes from there. You can run those example scripts to test them on your machine. These are very good source of learning.

$ git clone https://github.com/fchollet/keras
$ cd keras
$ sudo python setup.py install

Optional installation of Theano

Installation of Theano is optional as we have already installed TensorFlow. However, installing Theano can prove advantageous while building Keras code and switching between TensorFlow and Theano. Execute the code below to finish installing Theano:

$ sudo pip3 install theano

Congratulations !!! you have finished with all installations and completed the set up for your deep learning workstation. You are now ready to execute your first code of deep learning neural network.

I hope this article will prove helpful to set up your deep learning workstation. It is indeed a lengthy article but covers all technicalities which you may need in case of any difficulty during the process. A little knowledge about every component you are installing also helps you to make any further changes in the setting.

Let me know how you find this article by commenting below. Please mention if any information I missed or any doubt you have regarding the process. I will try my best to provide the information.

Why Ubuntu is the best for Deep Learning Framework?

Ubuntu for deep learning

Why use Ubuntu for deep learning? This is the question this article tries to answer. After reading this article you will not have any doubt regarding which platform you should use for your deep learning experiments.

I was also quite happy with my windows 10 and Colab/Jupyter notebook combination for all of my Artificial Intelligence (AI)/Machine Learning(ML)/Deep Learning(DL) programming. Until I decided to start some serious work with deep learning neural network models.

Is it really important?

“[M]achines of this character can behave in a very complicated manner when the number of units is large”

Alan Turing (1948), “Intelligent Machines”, page 6

Soon I started my first model building, the limitation of my present working environment came into my notice. In some forums like Quora, Reddit etc. I was reading some threads on deep learning. And suddenly someone there in his reply mentioned that Ubuntu is a better choice for serious application in deep learning.

Suddenly it struck me that probably it is not a wise choice to continue with Windows for advanced application of deep learning and AI. But it was just a hunch that time. And I needed strong logical points before I made my mind to switch an OS which the only platform I have ever used.

So, I started scouring through internet. Read almost all blogs, threads in discussion forum to make sure if switching the platform really worth my time. Because getting aquiented with a completely new OS takes time and time is money for me.

If anyone of you also in the learning phase and serious about deep learning, this article will help him/her to make an informed decision to select which platform he/she should use. Because it is always a waste of valuable time to switch your working environment at a later stage of learning and lots of rework too.

I have already done the heavy work for you and presenting a vivid description of the topic so that you get all your questions answered at one place.

So let’s start with an introduction with Ubuntu. Although this is not an article on Ubuntu. You can find so many good articles on Ubuntu. But still before knowing its special features concerning deep learning, here is a very brief idea.

What is Ubuntu?

Ubuntu is one of the most popular forms of Linux distribution. It is developed by Mark Shuttleworth of Canonical lab. It is also the most famous open source technology that means all features and applications it offers are completely free. And it is an undeniable fact that being free makes an application mile ahead in popularity automatically.

“What commercialism has brought into Linux has been the incentive to make a good distribution that is easy to use and that has all the packaging issues worked out.”

Linus Torvalds, Principal developer of the Linux kernel

Being open-source, Ubuntu offers its a new update almost twice in a year while Long Term Support(LTS) releases after every two years with updated security patches. It has three main categories for distribution which are core, desktop and Server.

The core version is mainly for those working on IoT devices and robotics. The desktop version is for common users doing day to day office tasks and also programming applications. The server version is obviously for client-server architecture and generally meant for industry uses.

Why Ubuntu is preferred for deep learning?

The Ubuntu version I installed recently is version 20.04 and it is the latest version on this distro. It is a much-improved version than its predecessor. Especially the additional supports it provides for AI, ML and DL programmer is just stupendous.

The MicroK8s feature

“Given its smaller footprint, MicroK8s is ideal for IoT devices- you can even use it on a Raspberry Pi device!”

Kubernets.io, Technical blogs
MicroK8s integration in Ubuntu
MicroK8s integration in Ubuntu

The user interface has improved a lot. The installation process has become very easy (it was always smooth though). The Ubuntu 20.04 version now comes with support for ZFS (a file system with high availability and data integrity) and an integrated module called Microk8s. So, the AI, DL developers now don’t have to install it separately.

Microk8s enables the AI application module to get set up and deployed blazing fast. It comes preloaded with all necessary dependencies like automatic update and security patches. Quite obvious that with this version of Ubuntu you will now need to spend much lesser time to configure the environment.

Kubeflow

It is another deep learning edge of Ubuntu 20.04 and comes as an add on to Microk8s. Kubeflow was developed by Google in collaboration with Canonical, especially for Machine Learning applications. It provides inbuilt GPU acceleration for deep learning R&D.

What is Kubeflow?

Kubeflow deployed with Kubernetes and do away with the barrier to create production-ready stacks. It provides developers with enhanced AI, ML capabilities with edge computing feature. The researchers and developers involved in cutting edge research activities get a secured production environment with strict confinement in complete isolation.

Kubeflow architecture
Source: Kubeflow blog by Thea Lamkin

The security provided by Kubeflow and Kubernetes integration is unparallel. Many AI/ML/DL development add ons like Jaeger, Istio, CoreDNS, Prometheus, Knative etc come integrated with it and can be deployed with a single command.

The programming edge of Ubuntu

When it comes to programming activities, Ubuntu is undoubtedly the leader. Not only for AI, ML or DL programming but any kind of programming task and application development task is best performed when the operating system is Ubuntu.

It has the best libraries, vast examples and tutorials readily available for users. The support for all open source software used with Ubuntu is massive to solve any issue you face quickly. The updates are also regular and irrespective of which version you are using.

The enhanced Graphics Processing Unit

Powerful GPU is an important component for serious ML/DL programming. Ubuntu has an edge here to make any contemporary changes in AI environment. NVIDIA the respectable name in GPU manufacturing industry has put all efforts to make Ubuntu powerful with CUDA to its maximum capacity.

Ubuntu in its latest version 20.04 also gives its user an option to use external graphics cards through thunderbolt adapters. They can add them through dedicated PCI slots too.

So no surprise that all deep learning frameworks like Keras, TensorFlow, OpenCV, PyTorch etc all prefer Ubuntu over all other OS. The world leaders in advanced AI/ML/DL research and development like autonomous car sector, the CERN and LHC, famous brands like Samsung, NVIDIA, Uber etc. all use Ubuntu for their research activities.

Advanced feature and support for Hardware

The support Ubuntu comes with for hardware is also exceptional. Ubuntu provides organization-specific hardware certification which means high compatibility is assured. The hardware has tight integration with BIOS and factory level quality assurance.

To achieve the quality hardware Canonical directly deals with the hardware manufacturers. Canonical develops partnerships with major hardware manufacturers in order to provide an operating system with preloaded and pretested features.

The support team is as usual exceptional anytime ready for any kind of troubleshooting. With all these assurances developers can fully concentrate on their R&D.

Finally the Software

Canonical’s Ubuntu has its own open-source software collection. The software devices all are compatible both at board level as well as component level. All of its versions old or latest contain the same package of software. This feature has several advantages.

The large user base of Linux using different versions of Ubuntu with a seamless experience of switching between them. This becomes possible only because of the same software packages across all the versions. Developers can easily test their applications locally before launching them on the web for global users.

The bunch of open source software makes it possible fast creation of AI models. Creation of software and debugging is fast and easy on IoT hardware before deployment.

The snapcraft tool

Snapcraft, the app store for Linux

It is another major feature of Ubuntu which makes it a clear winner for an ideal programming OS. Snap is a feature for packaging and distribution of containerized applications. The automatic updates in Ubuntu are very safe to install and execute only because of this snap feature.

Snapcraft is a command line tool which creates snaps. This tool makes packaging of applications very easy. The feedback of the users through snapcraft tool has immense importance for the developers. These feedback provides the necessary insights about the software and helps further improvement.

For example, a study made by Canonical revealed that maximum users of Ubuntu never update the software. So, based on this feedback they started to provide automatic updates. Canonical does not need to provide support for older versions. As the complete user base simultaneously moves to the latest version of Ubuntu.

Massive online support base

Being an open-source platform, Ubuntu has a massive online support and documentation repository. Any user anytime can use the service like Slack and Skype to ask their queries. The Ubuntu support group is also very vibrant. Here you can expect a reply from the development team itself.

Even popular question-answer groups like Quora, Reddit etc. also have threads on Ubuntu related queries. I personally got many of my queries already answered there. Even you have some unique problem that has not answered earlier, you can post them in any of these platforms. It is highly likely that within a few hours you will get some really helpful suggestions by either any normal user or the Ubuntu support/development team itself.

Final words

As you finish reading this article you have a clear idea of why you should pick Ubuntu as your machine learning or deep learning programming platform. I have tried my best to put together all the information I got reading many articles online or offline.

I invested a lot of my time researching this topic to be 100% sure before diving deep into the advance learning. It is an important decision no doubt. I had bitter experiences before when I already put a lot of effort into learning a particular application. And then one day due to some limitation I had to backtrack and change that platform or application.

It was quite a rework and wastage of time starting fresh from scratch. And it can be avoided if I had done thorough research in the very beginning. So, I learned my lessons and made no mistakes this time. And hope it will also help you make an informed decision.

So, please let me know if you find the article useful by commenting below. Any queries, doubt, suggestions are welcome. I would try to improve the post further based on your comments.

Evolution of Deep Learning: a detailed discussion

Evolution of Deep learning

The evolution of deep learning has experienced many ups and downs since the last few decades. One time it rose to the pick of the popularity, expectations were high and suddenly some setbacks in experimental trials created a loss of confidence and disappointments. This article will cover this journey of deep learning neural network from its inception to its recent overwhelming popularity.

Background of Machine learning

This all started with very basic concepts of probabilistic modelling. These are very elementary statistical concepts from the school syllabus. This was the time even before the invention of the term machine learning. All models, functions were solely crafted by the human mind. 

Probabilistic models

These models are the first step towards the evolution of deep learning. These models are developed keeping in mind real-world problems. Variables having relationships between them. Combinations of dependent and independent variables were used as inputs to these functions. These models are based on extensive mathematical theory and more empirical than practical.

Some popular such probabilistic models are as below:

Naive-Bayes classification

It is basically the Bayes theorem with some naive assumption hence the name. The concept of this modelling was established long back during the 18th century. The assumption here is “all the features in the input data are all independent”. 

For example, suppose a data set has the data of some persons with or without diabetes disease and their corresponding sex, height and age. Now the Naive Bayes will assume that there is no correlation between all the features between sex, height and age and they contribute independently towards the disease. This assumption is called class conditional independence.

So how this assumption helps us to calculate the probability? Suppose, there is a hypothesis H which can be true or false and this hypothesis gets affected by an event e. We are interested to calculate the probability of the hypothesis being true given that the event is observed. So, we need to calculate: P(H|e)

According to Naive Bayes’ theorem

Here, P(H|e) is called the posterior probability of the hypothesis with the information of event e and can not be easily computed. So, we need to break down it as in equation 1. Now we can calculate each of the probabilities separately from the frequency table and calculate the posterior probability. 

You can read the whole process of calculation here.

P(H) is the prior probability of the hypothesis before observing the event.

Logistic regression

This regression modelling technique is so basic and popular for almost all classification problems that it can be considered as the “Hello World” of Machine Learning. Yes, you have read it right. It is a process for classification problems. Don’t let the word regression in the name misguide you. 

It is originally a regression process which becomes a classification process when the process involves a decision threshold for the prediction. Deciding a threshold for the classification process is very important and tricky one too.

We need to decide the decision threshold depending on the particular case in hand. There can be four types of responses in case of classification problems which are “true positive”, “true negative”, “false positive” and “false negative” (read details about them here). We have to fix the probability of one type of occurrence while reducing another depending on its severity.

Example and basic concept 

For example, take the case for a severe crime and it is to decide if the person should be hanged or not. It is a problem of binary classification with two outputs guilty or not guilty. Here the true positive case is the person found guilty when he actually has committed the crime. On the other hand, the true negative is the person found guilty when he has not committed the crime.

So, no doubt the true negative case here is of very serious type and should be avoided at any cost. Hence while fixing the decision threshold, you should try to reduce the probability of true negative while fixing the probability of true positive cases.

Unlike linear regression predicting the response of a continuous variable, in logistic regression, we predict the positive outcome of a binary response variable. Unlike linear regression which follows a linear function, a logistic regression has a sigmoid function.

The equation for logistic regression:

Equation for logistic regression

Initial stages of evolution of Deep Learning

Although the theoretical model of deep learning came in 1943 by Walter Pitts, a logician and Warren McCulloch, a neuroscientist. The model was called McCulloch-Pitts neurons and still regarded as a fundamental study on deep learning.

The first evidence of the use of neural networks in some toys for children made during the 1950s.  The same year the legendary mathematician Alan Turing proposed the concept of Machine Learning and even gave hints about the genetic algorithm in his famous paper “Computing machinery and intelligence”.

Alan Turing
Alan Turing (Image source: http://rutherfordjournal.org)

In 1952, Arthur Samuel first time coined the term Machine Learning. He is known as the father of machine learning. He with his association with IBM also developed the first machine learning programme.

The perceptron: the perceiving and recognizing automaton” a research paper published in the year 1957 by Frank Rosenblatt set the foundation of Deep Learning network.

In 1965 mathematician Alexey Ivakgnenko and V.G. Lapa arguably developed the first working deep learning network. Ivakgnenko for this contribution is considered as the father of deep learning by many.

The first winter period

The period between 1974-80 is considered as the first winter period. It is a long rough period faced by AI research. A critical report submitted by Professor Sir James Lighthill on AI research as asked by UK parliament played a major role to initiate this period.

The report was very critical about the AI research in the United Kingdom and was in the opinion that nothing has been done in the name of AI research. All expectations about AI and deep learning were all hype; creation of a robot was nothing but a mirage; such comments were very disappointing and resulted in the /retraction of research funding for most of the AI research.

Invention of Backpropagation algorithm

Then in during 1980, the famous Backpropagation algorithm with Stochastic Gradient Descent (SGD) was invented for training the neural network. This can be considered as path-breaking discovery as far as deep learning is concerned.  These algorithms are still the most popular among deep learning enthusiasts. This algorithm only led to the first successful application of Neural Network.

LeNet

Come in 1989 we got to see the first real-life application of Neural Net. It was Yann LeCun who made this possible through his tireless effort in Bell Labs to combine the ideas of Backpropagation and Convolutional neural network.

Yann LeCun
Yann LeCun

The network was named after LeCun as LeNet. It found its first real-world problem-solving use in identification of handwritten codes. It was so efficient in identifying the codes that United States Postal Service adopted this technology in 1990 for identifying the digits of ZIP codes on the mail envelopes. 

Yet another winter period; however brief one

In spite of the success achieved by LeNet, in the same year 1990, the advent of Support Vector Machine pushed the Neural Network almost extinction. It gained very fast popularity mainly because of its easy interpretability and state of the art performance. 

It was also a technology came out of from the famous Bell Labs. Vladimir Vapnik and Corinna Cortes pioneered its invention. They started working on it long back in 1963. It’s their continuing effort that resulted in the revolutionized Support Vector Machine of 1990.

Support Vector Machine: a new player in the field

This new modelling is mainly conceptualized on a kernel trick to calculate the decision boundary between two class of variables. Except for a few cases, it is very difficult to discriminate variables on a two-dimensional. It becomes far easier to understand in a higher-dimensional space. A hyperplane of higher dimensional space becomes a hyper line in two-dimensional space i.e. a straight line. This process of transforming the mode of representation is known as kernel trick. 

Below is an example of what I mean to say by higher dimension representation for classification.

SVM: data representation in higher dimension
SVM: data representation in higher dimension

In figure A, two classes of observations that are red and blue classes are classified using a hyperline. It is a straight forward case and the classification is easy. But consider the figure in B here a straight line can not classify the points.

As a new third axis has been introduced in figure C, we can see that the classes are now can be easily separated here. Now how it will look if the figure we again convert it to its two-dimensional version? see the figure in D.

So, a curved hyperline has now separated the classes very effectively. This is what a support vector machine does. It finds a hyperplane to classify the points and then any new point gets its class depending on which side of the hyperplane it resides.

Kernel trick

A kernel trick can be explained as a technique to maximize the margin between the hyperplane and the closest data points. It makes the process very easy by curtailing the need to calculate the new coordinates in the new representation space. The kernel function only calculates the distance between the pair of points. 

This kernel function is not something that SVM learns from the data. It is solely crafted by the human mind. The distance between the points in the original space to that of in the new representation space is mapped. And then the hyperplane is created through learning from the data. 

Pros of SVM

  • The process is very accurate for the limited amount of data and when data is scarce
  • It has a strong mathematical base and also in-depth mathematical analysis is possible in SVM
  • Interpretation is very easy
  • The popularity of this process was instant and unprecedented

It also sufferers from some weaknesses like:

  • Scalability is an issue. When the data set is vast it is not very suitable.
  • Modern-day databases with a huge amount of images with enormous information provided the recognition process is efficient. SVM is not the preferred candidate here.
  • It is a shallow method so feature engineering is not easy.

Decision tree

During 2000 another classification technique made its debau. And instantly became very popular. It even surpasses the popularity of SVM. Mainly because of its simplicity, ease of visualizing and interpretation it became so popular. It also uses an algorithm which consumes very limited resource. So, a low configuration of the computing system is not a constrain for the application of the decision tree. Its some other benefits are:

  • The decision tree has a great advantage of being capable of handling both numerical and categorical variables. Many other modelling techniques can handle only one kind of variable.
  • Requires no data processing which saves a lot of user’s time.
  • The assumptions are not too rigid and model can slightly deviate from them.
  • The decision tree model validation uses statistical tests and the reliability is easy to establish.
  • As it is a white box model, so the logic behind it is visible to us and we can easily interpret the result unlike the black-box model like an artificial neural network.

But it does suffer from some limitations. Like it has a problem of overfitting. Which means that the performance with training data does not reflect when an independent data set is used for prediction. It is quick to produce a result which is often lacking satisfactory accuracy.

However, since its inception in 2000, it continued its golden run till 2010.

Random forest

This technique came to improve the weaknesses of the decision tree. As decision tree was already popular for its simplicity. Random forest took no time to win the heart of all machine learning enthusiasts. 

As it overcomes the limitations of the Decision tree, it became the most practical and robust among the shallow ML algorithms. Random forest is actually ensembling of decision trees i.e. it is a collection of decision trees where each decision tree has trained with a different dataset. The more decision tree a random forest model includes, the more robust and accurate its result becomes. It is like as we consider a forest a robust one if it has many trees.

Random forest: ensemble of decision tree
Random forest: ensemble of decision tree

Random forest actually makes a final prediction from the prediction obtained from each of the decision tree models to overcome the weakness of a single decision tree model. In this sense, the random forest is a bagging type of ensemble technique. 

We can have an idea of Random forest’s popularity by the fact that in 2010 it became the most liked machine learning in the famous data science competition website Kaggle. 

The Gradient Boosting modelling was then the only other approach which came up as the closest competitor of random forest. This technique ensemble all other weak machine learning algorithms mainly decision tree. And it was quick to outperform random forest. 

In Kaggle very soon gradient boosting ensemble approach overtake random forest. And still, this technique is the most used machine learning method along with the deep learning technique in almost all Kaggle competitions.

Dark Knight rises: The neural network era starts

Although the neural network was not consistent in showing its potential since 1980. Its success when demonstrated by some researchers like from IBM etc. it surprised the whole world with intelligent machines like Deep Blue, Watson etc. 

The dedicated deep learning scientists putting their hard work in research never had any doubt about its potential and what it is capable of to do. The only constrain till then the research work was in very scattered form. 

A coordinated research effort was very much required to establish its potential beyond any doubt. The year 2010 marked the dawn of a new era when for the first time such effort was initiated by Yann LeCun of  New York University, Yoshua Bengio of the University of Montreal, Geoffrey Hinton and his group of University of Toronto and IDSIA in Switzerland.

From the group of researchers, Dan Ciresan of IDSIA first showed the world some successful applications of modern deep learning in 2011. Using his developed GPU trained deep learning network, he won some of the prestigious academic image classification competitions.  

The ImageNet

ImageNet image classification competition conceptualized by Geofrey Hinton and his group from the University of Toronto started a significant chapter in the history of Deep Learning Neural Net in the year 2012.  

Screenshot of ImageNet
Screenshot of ImageNet (http://www.image-net.org/)

In the same year, a team headed by Alex Krizhevsky and guided by Geoffrey Hinton recorded an accuracy of 83.6% in this image classification challenge. Which was on quite a hire side compare to the accuracy of 74.3% achieved by computer vision using classical approaches in the year 2011. 

The ImageNet challenge was considered to be solved when someone with a deep convolutional network (convnets) improved the image classification accuracy up to 96.4%. Since then it was the deep convolutional neural net that had always dominated the machine learning domain.

The deep convolutional neural net got recognition by the whole world after its overwhelming success. Since then all major computer conferences and programmers meet almost all machine learning solutions are based on the deep convolutional neural net.

In some other fields like natural language processing, speech recognition also the deep convolutional neural net is a dominant technology replacing other previous tools like decision tree, SVM, random forests etc. 

A good example of major players switching to deep convolutional neural net from other technologies are like the European Organization for Nuclear Research, CERN the largest particle physics laboratory in the world has ultimately switched to deep convolutional neural net to identify new particles generated from Large Hadron Collider (LHC); earlier they were using decision tree-based machine learning methods for this task.

Conclusion

The article presents a detailed history of how deep learning has made a long way to reach today’s popularity and use in many fields across different scientific disciplines. It was a journey with many peaks and valleys which started way back in 1980. 

Different empirical statistical methods and machine learning algorithms preceding to deep learning made way for deep learning techniques mainly because of its high accuracy with a large amount of data. 

It registered many successes and then suddenly lost in despair for not being able to meet the high expectation. It always has a true potential being more a practical technique than empirical. 

Now the question is what is there in future of deep learning? What new surprises are in stock? The answer is really tough. The history we discussed here is evidence that many of them are already here to revolutionize our life.

So the next major breakthrough may also be just around the corner or it may take still years. But the field is always evolving and full of promises of blending machines with true intelligence. After all it learns from data so it will not repeat history of failures.

References

  • http://www.image-net.org/
  • Chollet, F., 2018. Deep Learning mit Python und Keras: Das Praxis-Handbuch vom Entwickler der Keras-Bibliothek. MITP-Verlags GmbH & Co. KG.
  • https://www.wikipedia.org/
  • Cortes, C. and Vapnik, V., 1995. Support-vector networks. Machine learning, 20(3), pp.273-297.
  • https://www.import.io
  • Vapnik, V., 1995. Support-vector networks. Machine learning, 20, pp.273-297.
  • Schölkopf, B., Burges, C. and Vapnik, V., 1996, July. Incorporating invariances in support vector learning machines. In International Conference on Artificial Neural Networks (pp. 47-52). Springer, Berlin, Heidelberg.
  • Rosenblatt, F., 1957. The perceptron, a perceiving and recognizing automaton Project Para. Cornell Aeronautical Laboratory.

What is deep learning? an overview

Deep learning basics

Deep learning is actually an artificial intelligence function with immense capability to find out the hidden pattern within a huge amount of data generated in this era of data explosion. It is an advanced learning system which mimics the working principle of the human brain. Such kind of vast unstructured data is not possible for the human being to analyze and draw some conclusion. So, such a learning procedure has been proved very helpful to make use of big data.

According to Andrew NG the founder of deeplearning.ai and popular Coursera deep learning specialist

Deep learning is a superpower. With it you can make your computer see, synthesize novel art, translate languages, render a medical diagnosis, or build pieces of a car that can drive itself. If that is not a superpower, I don’t know what is.

Andrew NG

In this respect, machine learning is a much bigger domain and deep learning can be considered as a subdomain of it. Deep learning relates to the deep neural network and mainly works in an unsupervised manner.  The network is popular with the name Artificial Neural Network as it mimics our brains vast network of neurons.

Deep-Neural-Network
A schematic diagram of neural network with two hidden layer

Difference between machine learning and deep learning

 The main difference between these two processes lies in the feature extraction process from images. Feature extraction is the basic component of both the processes.

Feature extraction

But the difference is machine learning requires to perform this process manually and then feed this information in the model. Whereas in deep learning the feature extraction happens automatically and provided to the network to match with the object of interest. In this context, deep learning is called an “end-to-end” process of learning.  

Resource intensive

Another major difference between these two processes is data processing capability. Deep learning can make good use of a huge number of labelled data provided you have sophisticated Graphics Processing Unit (GPU). On the other hand, machine learning has different modelling techniques to give you a good estimate even with a less number of labelled data. 

Scaling with the data

Deep learning has a big plus point over machine learning which makes deep learning far more accurate than machine learning is that deep learning algorithm scales itself with the data. That means as we use more images to train the deep learning algorithm it will give the more accurate result.

But it is not the case with machine learning algorithms. Machine learning algorithms attain a plateau after a certain level of performance achieved. It will not improve even with more training after this level. See the below image to realize the difference.

Deep learning  improves as the data size increases
Deep learning improves as the data size increases

So, now the question is which approach should be used by one? To answer that I would suggest that it depends on your situation i.e. the type of problem you desire to solve, the GPU capacity available to you and the most important factor how much labelled data you have to train the algorithm.

Deep learning is more accurate than machine learning but more complex too. So unless you have thousands of images to train it and high-performance GPU to process such a large amount of data, you should use any machine learning algorithm or combination of them.

Deep learning: the working principle

Deep learning mainly relies on Artificial Neural Network (ANN) to unravel the wealth of information from big data. So, it is very interesting to know how does this process is actually performed. 

If you have a little exposure to the traditional modelling process, you may have an idea that conventional regression models suffer from various limitations and had to fulfil several assumptions.

And most of the cases it performs not so well in capturing the real nonlinear nature of real-world data. This is mainly because of the traditional regression way of modelling does not attempt to learn from the data. This learning process which makes the real difference between these two approaches. 

The term deep in deep learning suggests the mechanism of information processing through several layers. Deep learning or deep structured learning is mainly based on a learning process which is called representation learning. It is actually finding out representations for hidden feature or pattern recognition from raw unstructured data.

Accuracy of both the approaches

The accuracy deep learning attains in its estimation is impeccable. The applications mentioned above needs to be very precise to satisfy end user’s expectation. And such accuracy can only be provided by deep learning. To achieve such accuracy, it trains itself using the labelled data to continuously improve the prediction by minimizing error. 

So, the amount of labeled data is an important factor determining how perfect is the learning process. For example to make a perfect self-driving car we need to train the algorithm with a huge amount of labeled data like images and videos of road, traffic, people walking on the road or a busy road at any time. 

No doubt deep learning is a very computation-intensive process. Processing such a huge amount of image and videos and then using them to train the algorithm may take days and weeks altogether. 

This was one of the key factors that in spite of the concept of deep learning came back in 1980 it was not in much use till recently. It is just because back then researchers were not equipped with systems with high computing capacity. In today’s era of supercomputers or systems with high-performance GPUs, advanced cloud computing facilities have made processing such enormous size of data possible within hours or even less. 

ANN: mimicking the human brain

Artificial Neural Network or ANN as the name suggests it mimics our brains working principle. In our brain, the neurons are the working unit. The network between the innumerable neurons works as layers in carrying the sensation from different body parts to the designated part of the human brain. As a result, we can feel touch, smell the fragrance, taste a tasty food or hear music.

The learning process

Human being basically learns from past experience. Since childhood, a person gathers experience about everything in his surrounding and thus learn about them. For example, how can we able to identify a dog or cat? Perhaps because we have seen a lot of these animals and learned the differences between their appearances. So, now for us chance of making a mistake in identifying these animals is almost nil.

Deep learning for feature extraction

This is the very nature of the learning process of a human being. The first time a baby sees a dog he/she knows it is a dog from the parents. Gradually through this process, the baby comes to know animals of this appearance are called dogs. This learning process of a human being is exactly followed in case of deep learning. And the result gets more and more accurate as the learning process continues.

ANN also comprises of neurons and the nodes are connected to form a web-like structure. There can be multiple layers of such neurons (generally two to three but theoretically it can be any number). These layers pass different information from one layer to another and finally produces the result.

One layer of neuron act as the input layer for the immediate next neuron layer. The term “deep” actually signifies the number of layers only in any ANN. The most common and frequently used deep learning network is known as a Convolutional Neural Network (ConvNet/CNN).

Convolutional Neural Network (CNN)

CNN is one popular form of deep learning. It actually eliminates the need for manual feature extraction from any image in order to recognize it. CNN used thousands of images and used hundreds of hidden layers for its feature extraction to match it with the object of interest.

The hidden layers are arranged in such a fashion to recognize features with a higher order of complexity. That means the first hidden layer may recognize only the edges of the image whereas the last one will recognize a more complex shape of the image which we desire to recognize.

Applications of deep learning

Today’s digital era is collecting and generating data at every moment of our presence in the digital world. Be it social networking sites, online shopping, online movies or online study/research anything. We are providing lots of data as input to get the desired output from the internet. 

The data size is enormous and also completely unstructured but has a lot of information on it. Which if analyzed properly can help the government to take better policy decision or business houses to frame an effective business plan.

Since last few years, this learning process has been the key concept behind some revolutionary ideas and applications like:

Image colourization

Recently some of the old black & white movies are relaunched with colour effects. If you watch them you may be surprised by the precision and accuracy of the colourization process. Artificial intelligence has made this task possible to complete within a few hours. It was previously only possible with human skills and hard labour.

The famous movie “Pather Panchali” of the Oscar-winning film director Shri Satyajit Ray was shot in black & white. Recently Mr Ankit Bera, Assistant Research Professor of AI at the University of Maryland in U.S. has done an experiment to colourize the movie and the result is really impressive (read the full story here). Here is a comparison of two still frames side by side from the movie for your comparison.

A screenshot from the report

Self-driving car

A self-driving car where deep learning enables a car to recognize a stop sign or to distinguish a pedestrian from a lamppost, judges the situation of a busy road and thus reducing the chance of an accident.

An autonomous car is able to take human-like decisions on the basis every probable situations you may face while driving. It is still in the testing phase. It improves its perfection as it gets trained with more data of real-life traffic condition.

Facial recognition

Its presence is now everywhere, be it biometric attendance system, AADHAR enabled transaction or your mobile’s face lock system. This recognition system is so smart that it identifies you even when you have shaved off your mustache or changed your hairstyle.

Natural language processing and Virtual assistants

In natural language processing, speech recognition where deep learning plays a crucial role in following the command given by a person to his smartphone or any smart device. You may also have tried Google’s speech to text tool to save yourself from typing a lot, so thanks to voice recognition also a gift of deep learning.

Different online service providers have launced several virtual assistants mainly on the basis of this concept. You must have heard the names of apple’s Siri, Microsoft’s Cortana, Amazon’s Alexa etc. all are very popular virtual assistants making our daily lives a lot easier.

Language translations

In language translations deep learning has played a big role as in natural language processing. This has benefitted the travelers, business people and many others who need to visit a lots of place and also communicate a lot with people speaking foreign language.

Chatbots

You may have experienced that in recent times whenever you want to contact the customer support or product help of any company to get some questions answered related to any of their product, the first basic question-answer part is generally automated and intelligently replied without any human interventions.

In medical research

It plays a pivotal role nowadays. In cancer research identifying the affected cells. A dedicated team of researchers at UCLA has built an advanced microscope which used deep learning to pinpoint the cancerous cells.

Many industries like drug industries, automobiles, agriculture sector, board game making brands, medical image analysis are actively conducting research with deep learning in their R&D sector. 

Conclusion

So, I think in this article I am able to give you a basic idea about what is deep learning and how it works. Although the idea of deep learning was conceptualized long back in 1986 due to limitation of resources it did not take up. And took more than a decade to come into action. Today we have sophisticated computing devices and no dearth of data. In fact, there are oceans of data with every aspect of our daily life.

Every moment of our life, our life and every activity happening in the world is getting stored in one or other format of data, mainly as images, videos, audios etc. These data is so huge that conventional process of data analysis can not handle it and simple human capacity would take decades to analyze it.

Here comes deep learning with its fascinating power of data analysis mainly pattern recognition. Deep learning works especially with more accuracy when the database is a large number of audio, video or image files, so it is the best fit for the situation. And this is the reason it gets its name as “Deep” as many features of the images get extracted at different layers of the deep learning process. So, deep actually refers to the depth of layers.

Deep learning is vast topic and a single article can not cover all of its aspects. So its basic features are mainly discussed here and you can start your deep learning process with this article.

Follow this blog regularly as many interesting articles regarding deep learning and its application will be posted here regularly. If you have any particular topic in your mind please let me know by commenting below. Also, share your opinion about this article and how it can be improved further.

References