How to set up your deep learning workstation: the most comprehensive guide

Set up deel learning workstation

This article contains a step by step detailed guideline to set up a deep learning workstation with Ubuntu 20.04. This is actually a documentation of the process I followed for the same in my computer. I repeated this process a no. of times. And every time I thought I should have documented the process. Proper documentation helps a quick and error-free set up in the next instance.

I have mentioned the most common mistakes and errors during the process and how to avoid or troubleshoot them. Bookmarking this page can help you quickly refer it whenever you get stuck in any of the steps.

I have done this complete setup process a few times in both of my old and new laptops with completely different configurations. So, hope that the problems I faced are the most common one. It took me a considerable time to fix all those issues, mainly by visiting different discussion groups like StackOverflow, Ubuntu discussion forum and many other discussion threads and blogs.

I compiled them in one place here. So that you don’t have to visit multiple sites and refer to this post only to complete the whole installation process. In this way, this documentation will save a lot of your valuable time.

Prerequisites to set up deep learning workstation

I assume that you already have Ubuntu on your computer. If not then please install the latest version of Ubuntu. This is the most famous open-source Linux distribution and available for free download here. Although it is possible to run deep learning Keras models on Windows, it is not recommended.

Why should you use Ubuntu for deep learning? Refer to this article

Another prerequisite for running deep learning models is a good quality GPU. I will advise you to have an NVIDIA GPU in your computer for satisfactory performance. It is a necessary condition not must though. Because running sequence processing using recurrent neural network and image processing through convolutional neural models in CPU is a difficult proposition.

Such models may take hours to give results when run with CPU. Whereas a modern NVIDIA GPU will take merely 5-10 minutes to complete the models. In case if you are not interested to invest for GPU an alternative is using cloud service for computing paying hourly rent.

However, in long run, this using this service may cost you more than upgrading your local system. So, my suggestion will be if you are serious about deep learning and wish to continue with even moderate use, go for a good workstation set up.

The main steps to set up a deep learning workstation

Now I assume that you have already completed with all the prerequisites to set up your deep learning experiments. It is a little time-consuming process. You will require a stable internet connection to download various files. Depending on the internet speed the complete process may take 2-3 hours (with an internet speed of 1gbps in my case it took 2 hours) to complete. The main steps to set up a deep learning workstation are as follow:

  • Updating the Linux system packages
  • Installation of Python pip command. It is the very basic command going to be used to install other components
  • Installing the Basic Linear Algebra Subprogram (BLAS) library required for mathematical operation.
  • HDF5 data frame installation to store hierarchical data
  • Installation of Graphviz to visualize Keras model
  • CUDA and cuDNN NVIDIA graphics drivers installation
  • Installation of TensorFlow as the backend of Keras
  • Keras installation
  • Installation of Theano (optional)

So, we will now proceed with the step by step installation process

Updating the Linux system packages

The following line of commands will complete the process of Linux system up-gradation process. You have to type the commands in Ubuntu terminal. The keyboard shortcut to open the terminal is “Ctrl+Alt+T”. Open the terminal and execute the following lines of code.

$ sudo apt-get update
$ sudo apt-get --assume-yes upgrade

Installing the Python-pip command

The pip command is for installing and managing Python packages. Next which ever packages we are going to install, this pip command will be used. It is an replacement of the earlier command easy_install. Run the following command to install python-pip.

$ sudo apt-get install python-pip python-dev

It should install pip in your computer. But sometimes there may be exceptions. As it happened to me also. See the below screenshot of my Ubuntu terminal. It says “Unable to locate package python-pip”.

It created a big problem as I was clueless about why it is happening. In my old computer, I have used it no. of times without any issue. After scouring the internet for several hours I got the solution. This has to do with the Python version installed in your computer.

If you are also facing the problem (most likely if using a new computer) then first check the python version with this command.

$ ls /bin/python*

If it returns python version 2 (for example python 2.7) then use python2-pip command or if it returns higher version python like python 3.8 then use python3-pip command to install pip. So, now the command will be as below

$ sudo apt-get install python3-pip

Ubuntu by default uses Python 2 while updating its packages. In case you want to use Python 3 then it needs to be explicitly mentioned. Only Python means Python 2 for Ubuntu. So, to change the Python version, use the following code.

# Installing Python3
$ sudo apt-get install python3-pip python3-dev

Installation steps for Python scientific suit in Ubuntu

Here the process discussed are for Windows and Linux Operating systems. For the Mac users they need to install the Python scientific suit via Anaconda. They can install it from the Anaconda repository. It is continuously updated document. The documentation provided in Anaconda is very vivid one with every step in detail.

Installation of the BLAS library

The Basic Liner Algebra Subprogram (BLAS) installation is the first step in setting up your deep learning workstation. But one thing Mac users should keep in mind that this installation does not include Graphviz and HDF5 and they have to install them separately.

Here we will install OpenBLAS using the following command.

$ sudo apt-get install build-essential cmake git unzip \
pkg-config libopenblas-dev liblapack-dev

Installation of Python basic libraries

In the next step, we will need to install the basic Python libraries like NumPy, Panda, PMatplotlib, SciPy etc. These are core Python libraries required for any kind of mathematical operations. So, be it machine learning or deep learning or any kind of computation intensive task, we will need these libraries.

So use the following command in Ubuntu terminal to install all these scientific suite simultaneously.

# installation of Python basic libraries
$ sudo apt-get install python-panda python-numpy python-scipy python- matplotlib python-yaml

Installation of HDF5

The Hierarchical Data Format (HDF) version 5 is an open-source file format which supports large, complex and heterogeneous data sources. It was developed by NASA to store large numeric data files in efficient binary formats. It has been created on the other two hierarchical data formats like HDF4 and NetCDF.

HDF5 data format allows the developer to organize his machine learning/deep learning data in a file directory structure very similar to what we use in any computer. This directory structure can be used to maintain the hierarchy of the data.

If we consider the directory nomenclature in the computer filing system, then the “directory” or “folder” is the “group” and the “files” are the “dataset” in case of HDF5 data format. It has importance in deep learning in order to save and fetch the Keras model from the disc.

Run the following command to install HDF5 in your machine

# Install HDF5 data format to save the Keras models
$ sudo apt-get install libhdf5-serial-dev python-h5py

Installation of modules to visualize Keras model

In the next step we will install two packages called Graphviz and pydot-ng. These two packages are necessary to visualize the Keras model. The codes for installing these two packages are as follow:

# Install graphviz
$ sudo apt-get install graphviz
# Install pydot-ng
$ sudo pip install pydot-ng

These two packages will definitely help you in the execution of the deep learning models you created. But for the time being, you can skip their installation and proceed with the GPU configuration part. Keras can also function without these two packages.

Installation of opencv package

Use the following code to install opencv package

# Install opencv
$ sudo apt-get install python-opencv

Setting up GPU for deep learning

Here comes the most important part. As you know that GPU plays an important role in deep learning modelling. In this section, we are going to set up the GPU support by installing two components namely CUDA and cuDNN. But to function properly they need NVIDIA GPU.

Although you can run your Keras model even in the CPU, it will take much longer time to train a model to compare to the time taken by GPU. So, my advice will be if you are serious about deep learning modelling, then plan to procure an NVIDIA GPU (using cloud service paying hourly rent is also an alternative).

Lets concentrate on the setting up of GPU assuming that your computer already have latest one.

CUDA installation

To install CUDA visit NIVIDIA download page following this link https://developer.nvidia.com/cuda-downloads. You will land in the following page. It will ask for selecting the OS you are using. As we are using Ubuntu here (to know why to use Ubuntu as the preferred OS read this article) so click Ubuntu.

CUDA installation-OS selection
CUDA installation-OS selection

Then it will ask other specifications of your workstation environment. Select them as per your existing specifications. Like here I have selected OS as Linux. I am using a Dell Latitude 3400 laptop which is a 64 bit computer, so in next option I selected x86_64; the Linux distribution is Ubuntu version 20.04.

Finally the installer type you have to select. Here I have selected the network installer mainly because it has comparatively smaller download size. I am using my mobile internet for the time being. So, it was the best option for me. But you can choose any of the other local installation options if there is no constrain of internet bandwidth. The plus point of local installation is you have to do this only once.

CUDA installation-specification selection
CUDA installation-specification selection

As all the specifications are mentioned, NVIDIA will provide you the installer. Copy the code from there and run in Ubuntu terminal. It will use Ubuntu’s apt to install the packages, which is the most easiest way to install CUDA.

CUDA installation code
CUDA installation code
$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
$ sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
$ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
$ sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
$ sudo apt-get update
$ sudo apt-get -y install cuda

Install cuDNN

“cuDNN is a powerful library for Machine Learning. It has been developed to help developers like yourself to accelerate the next generation of world changing applications.”

NVDIA.com

To download the specific cuDNN file for your operating system and linux distribution you have to visit the NIVIDIA download page.

Downloading cuDNN
Downloading cuDNN

To download the library, you have to create an account with NVIDIA. It is a compulsory step.

NVIDIA membership for Downloading cuDNN
NVIDIA membership for Downloading cuDNN

Fill in the necessary fields.

NVIDIA membership for Downloading cuDNN

As you finish registration a window with some optional settings will appear. You can skip them and proceed for the next step.

NVIDIA membership for Downloading cuDNN

A short survey by NIVIDIA is the next step. Although it is on the experience as developer, you can fill it with any of the options just to navigate to the download page.

Download survey for cuDNN
Download survey for cuDNN

Now the page with several download options will appear and you have to choose according to your specifications. I have selected the following debian file for my workstation.

Selecting the OS for cuDNN download
Selecting the OS for cuDNN download

Download the file (the file size is around 300mb in my case). Now to install the library, first change the directory to enter in the download folder and execute the install command.

Once you are in the directory where the library has been downloaded (by default it is the download folder of your computer) run the command below. Use the filename in place of **** in the command.

$ sudo dpkg -i dpkg -i ******.deb

You can follow the installation process from this page. With this the cuDNN installation is completed.

Installation of TensorFlow

The next step is installation of TensorFlow. It is very simple. Just execute the below command to install TensorFlow without GPU support using the pip command.


# Installing TensorFlow using pip3 command for Python3
$ sudo pip3 install tensorflow

Installing Keras

This is the final step of setting up your deep learning workstation and you are good to go. You can run the simple below command.

$ sudo pip3 install keras

Or you can install it from Github too. The benefits of installing Keras from Github are that you will get lots of example codes from there. You can run those example scripts to test them on your machine. These are very good source of learning.

$ git clone https://github.com/fchollet/keras
$ cd keras
$ sudo python setup.py install

Optional installation of Theano

Installation of Theano is optional as we have already installed TensorFlow. However, installing Theano can prove advantageous while building Keras code and switching between TensorFlow and Theano. Execute the code below to finish installing Theano:

$ sudo pip3 install theano

Congratulations !!! you have finished with all installations and completed the set up for your deep learning workstation. You are now ready to execute your first code of deep learning neural network.

I hope this article will prove helpful to set up your deep learning workstation. It is indeed a lengthy article but covers all technicalities which you may need in case of any difficulty during the process. A little knowledge about every component you are installing also helps you to make any further changes in the setting.

Let me know how you find this article by commenting below. Please mention if any information I missed or any doubt you have regarding the process. I will try my best to provide the information.

Why Ubuntu is the best for Deep Learning Framework?

Ubuntu for deep learning

Why use Ubuntu for deep learning? This is the question this article tries to answer. After reading this article you will not have any doubt regarding which platform you should use for your deep learning experiments.

I was also quite happy with my windows 10 and Colab/Jupyter notebook combination for all of my Artificial Intelligence (AI)/Machine Learning(ML)/Deep Learning(DL) programming. Until I decided to start some serious work with deep learning neural network models.

Is it really important?

“[M]achines of this character can behave in a very complicated manner when the number of units is large”

Alan Turing (1948), “Intelligent Machines”, page 6

Soon I started my first model building, the limitation of my present working environment came into my notice. In some forums like Quora, Reddit etc. I was reading some threads on deep learning. And suddenly someone there in his reply mentioned that Ubuntu is a better choice for serious application in deep learning.

Suddenly it struck me that probably it is not a wise choice to continue with Windows for advanced application of deep learning and AI. But it was just a hunch that time. And I needed strong logical points before I made my mind to switch an OS which the only platform I have ever used.

So, I started scouring through internet. Read almost all blogs, threads in discussion forum to make sure if switching the platform really worth my time. Because getting aquiented with a completely new OS takes time and time is money for me.

If anyone of you also in the learning phase and serious about deep learning, this article will help him/her to make an informed decision to select which platform he/she should use. Because it is always a waste of valuable time to switch your working environment at a later stage of learning and lots of rework too.

I have already done the heavy work for you and presenting a vivid description of the topic so that you get all your questions answered at one place.

So let’s start with an introduction with Ubuntu. Although this is not an article on Ubuntu. You can find so many good articles on Ubuntu. But still before knowing its special features concerning deep learning, here is a very brief idea.

What is Ubuntu?

Ubuntu is one of the most popular forms of Linux distribution. It is developed by Mark Shuttleworth of Canonical lab. It is also the most famous open source technology that means all features and applications it offers are completely free. And it is an undeniable fact that being free makes an application mile ahead in popularity automatically.

“What commercialism has brought into Linux has been the incentive to make a good distribution that is easy to use and that has all the packaging issues worked out.”

Linus Torvalds, Principal developer of the Linux kernel

Being open-source, Ubuntu offers its a new update almost twice in a year while Long Term Support(LTS) releases after every two years with updated security patches. It has three main categories for distribution which are core, desktop and Server.

The core version is mainly for those working on IoT devices and robotics. The desktop version is for common users doing day to day office tasks and also programming applications. The server version is obviously for client-server architecture and generally meant for industry uses.

Why Ubuntu is preferred for deep learning?

The Ubuntu version I installed recently is version 20.04 and it is the latest version on this distro. It is a much-improved version than its predecessor. Especially the additional supports it provides for AI, ML and DL programmer is just stupendous.

The MicroK8s feature

“Given its smaller footprint, MicroK8s is ideal for IoT devices- you can even use it on a Raspberry Pi device!”

Kubernets.io, Technical blogs
MicroK8s integration in Ubuntu
MicroK8s integration in Ubuntu

The user interface has improved a lot. The installation process has become very easy (it was always smooth though). The Ubuntu 20.04 version now comes with support for ZFS (a file system with high availability and data integrity) and an integrated module called Microk8s. So, the AI, DL developers now don’t have to install it separately.

Microk8s enables the AI application module to get set up and deployed blazing fast. It comes preloaded with all necessary dependencies like automatic update and security patches. Quite obvious that with this version of Ubuntu you will now need to spend much lesser time to configure the environment.

Kubeflow

It is another deep learning edge of Ubuntu 20.04 and comes as an add on to Microk8s. Kubeflow was developed by Google in collaboration with Canonical, especially for Machine Learning applications. It provides inbuilt GPU acceleration for deep learning R&D.

What is Kubeflow?

Kubeflow deployed with Kubernetes and do away with the barrier to create production-ready stacks. It provides developers with enhanced AI, ML capabilities with edge computing feature. The researchers and developers involved in cutting edge research activities get a secured production environment with strict confinement in complete isolation.

Kubeflow architecture
Source: Kubeflow blog by Thea Lamkin

The security provided by Kubeflow and Kubernetes integration is unparallel. Many AI/ML/DL development add ons like Jaeger, Istio, CoreDNS, Prometheus, Knative etc come integrated with it and can be deployed with a single command.

The programming edge of Ubuntu

When it comes to programming activities, Ubuntu is undoubtedly the leader. Not only for AI, ML or DL programming but any kind of programming task and application development task is best performed when the operating system is Ubuntu.

It has the best libraries, vast examples and tutorials readily available for users. The support for all open source software used with Ubuntu is massive to solve any issue you face quickly. The updates are also regular and irrespective of which version you are using.

The enhanced Graphics Processing Unit

Powerful GPU is an important component for serious ML/DL programming. Ubuntu has an edge here to make any contemporary changes in AI environment. NVIDIA the respectable name in GPU manufacturing industry has put all efforts to make Ubuntu powerful with CUDA to its maximum capacity.

Ubuntu in its latest version 20.04 also gives its user an option to use external graphics cards through thunderbolt adapters. They can add them through dedicated PCI slots too.

So no surprise that all deep learning frameworks like Keras, TensorFlow, OpenCV, PyTorch etc all prefer Ubuntu over all other OS. The world leaders in advanced AI/ML/DL research and development like autonomous car sector, the CERN and LHC, famous brands like Samsung, NVIDIA, Uber etc. all use Ubuntu for their research activities.

Advanced feature and support for Hardware

The support Ubuntu comes with for hardware is also exceptional. Ubuntu provides organization-specific hardware certification which means high compatibility is assured. The hardware has tight integration with BIOS and factory level quality assurance.

To achieve the quality hardware Canonical directly deals with the hardware manufacturers. Canonical develops partnerships with major hardware manufacturers in order to provide an operating system with preloaded and pretested features.

The support team is as usual exceptional anytime ready for any kind of troubleshooting. With all these assurances developers can fully concentrate on their R&D.

Finally the Software

Canonical’s Ubuntu has its own open-source software collection. The software devices all are compatible both at board level as well as component level. All of its versions old or latest contain the same package of software. This feature has several advantages.

The large user base of Linux using different versions of Ubuntu with a seamless experience of switching between them. This becomes possible only because of the same software packages across all the versions. Developers can easily test their applications locally before launching them on the web for global users.

The bunch of open source software makes it possible fast creation of AI models. Creation of software and debugging is fast and easy on IoT hardware before deployment.

The snapcraft tool

Snapcraft, the app store for Linux

It is another major feature of Ubuntu which makes it a clear winner for an ideal programming OS. Snap is a feature for packaging and distribution of containerized applications. The automatic updates in Ubuntu are very safe to install and execute only because of this snap feature.

Snapcraft is a command line tool which creates snaps. This tool makes packaging of applications very easy. The feedback of the users through snapcraft tool has immense importance for the developers. These feedback provides the necessary insights about the software and helps further improvement.

For example, a study made by Canonical revealed that maximum users of Ubuntu never update the software. So, based on this feedback they started to provide automatic updates. Canonical does not need to provide support for older versions. As the complete user base simultaneously moves to the latest version of Ubuntu.

Massive online support base

Being an open-source platform, Ubuntu has a massive online support and documentation repository. Any user anytime can use the service like Slack and Skype to ask their queries. The Ubuntu support group is also very vibrant. Here you can expect a reply from the development team itself.

Even popular question-answer groups like Quora, Reddit etc. also have threads on Ubuntu related queries. I personally got many of my queries already answered there. Even you have some unique problem that has not answered earlier, you can post them in any of these platforms. It is highly likely that within a few hours you will get some really helpful suggestions by either any normal user or the Ubuntu support/development team itself.

Final words

As you finish reading this article you have a clear idea of why you should pick Ubuntu as your machine learning or deep learning programming platform. I have tried my best to put together all the information I got reading many articles online or offline.

I invested a lot of my time researching this topic to be 100% sure before diving deep into the advance learning. It is an important decision no doubt. I had bitter experiences before when I already put a lot of effort into learning a particular application. And then one day due to some limitation I had to backtrack and change that platform or application.

It was quite a rework and wastage of time starting fresh from scratch. And it can be avoided if I had done thorough research in the very beginning. So, I learned my lessons and made no mistakes this time. And hope it will also help you make an informed decision.

So, please let me know if you find the article useful by commenting below. Any queries, doubt, suggestions are welcome. I would try to improve the post further based on your comments.

An introduction to Keras: the most popular Deep Learning framework

An introduction to Keras

In this article, I am going to discuss a very popular deep learning framework in Python called Keras. The modular architecture of Keras makes working with deep learning a very smooth and fast experience. Keras handles all higher-level deep learning modelling part very smoothly in both GPU as well as CPU of your workstation.

So, there is no surprise Keras with TensorFlow is the most popular and widely used deep learning framework. This post will introduce you with this framework so that you feel a little comfortable as you start using it for your deep learning experiments.

As you finish this article you will get some introductory idea about Keras like:

  • What is Keras and why it is so popular in deep learning research
  • Important features of Keras framework
  • The modular architecture of Keras
  • The back ends of Keras application like TensorFlow, Theano and Microsoft’s Cognitive Technology or CNTK in short
  • A brief comparison between the frameworks

So lets start with the very basic question…

What is Keras?

It is an Open-Source library written in Python to build neural network models. Keras is efficient enough to let us conduct deep learning experiments very fast while being very user friendly, extensible with modular architecture.

Keras was developed by Francois Chollet under the research project ONEIROS (Open Ended Electronic Intelligent Robot Operating System). Francois Chollet, an engineer with Google and also the creator of Xception deep neural network model.

Popularity of Keras as a deep learning framework

Keras is distributed under the MIT permissive licence; which makes it freely available even for commercial use. It can be considered as one of the big reasons for its popularity.

See the image below showing the search volume in thousands in Google trend for some popular deep learning frameworks.

Popularity of Keras in deep learning
Popularity of Keras in deep learning

TensorFlow being the default python library for all kind of basic operations is always popular in the machine learning domain. So do the sci-kit learn library, which is very famous for machine learning especially for so many shallow learning algorithms.

But you can see that Keras is also a popular keyword in spite of its very high-end use in neural models. We have to keep in mind that deep learning itself is a very specialized field with very limited use among researchers from specific industries.

Keras was conceptualized for the researchers to conduct deep learning experiments in academia and research setups like NASA, NIH, CERN. The Large Hadron Collider at CERN also used Keras in their search of unknown particles through deep learning.

Besides many famous brands like Google, Uber, Netflix, many startups are also heavily dependent on Keras for their deep learning application in different R&D activities.

The famous machine learning competition website Kaggle has top 5 winning teams using Keras to solve the challenges and the number is steadily increasing.

Important features of Keras

  • The same code can be run in both CPU and GPU and compatible with both Python 2.7 or 3.5
  • The deployment is very easy with the help of full deployment capabilities of TensorFlow platform.
  • Keras models can directly run from browsers by exporting the models into Java script, TensorFlow lite to run on the iOS, android and other devices.
  • Be it a convolutional network or recurrent network, Keras can handle both and even a combination of them.
  • Keras is popular in scientific research as the implementation of arbitrary research ideas is easy due to the low-level flexibility of Keras while providing high-level convenience to speed up the experimentation cycles.
  • Any task related to computer vision or sequence processing is very fast and smooth in Keras framework
  • Keras is capable of building any kind of deep learning model. It may be a generative adversarial network or neural Turing machine, a multi-input/output model etc. etc. Keras will efficiently build them

The modular functionality of Keras

Keras is designed to handle only the high-level model building part in the deep learning ecosystem. It lets the well-optimized tensors to handle the basic lower-level programming part. All the basic tensor operations and transformations are the subjects of specialized tensors managed by TensorFlow library of Google.

The modular functionality of Keras
The modular functionality of Keras

In this modular approach, the Keras used another two backends which are Theano and Microsofts’ CNTK. These are very popular deep learning execution engines. These are not exclusive to use with Keras. You can switch to any of them anytime if you find if it is better and faster to tackle the particular task.

All these three libraries enable Keras to run in both the CPU as well as GPU.

Here are these three backends in brief.

TensorFlow

Keras with TensorFlow back-end
Keras with TensorFlow back-end

This is a deep learning library developed by Google to handle all tensor operations smoothly. Based on TensorFlow 2.0, Keras has the potential to provide an industry-standard framework for building large GPUs.

During 2017 Google decided to provide support to Keras in TensorFlow’s core library. Francois Chollet was of the opinion that Keras is built as an interface to perform high end deep learning model building task and not a fully functional machine learning framework.

TensorFlow allows Keras code to run in CPU by wrapping itself into a low-level library called Eigen.

Eigen

As the tuxfamily.org mentions ” Eigen is a C++ template library for performing linear algebra: matrices, vectors, numerical solvers and related algorithms”.

cuDNN

And when TensorFlow is to run in the GPU, the library that TensorFlow uses to wrap itself is NVIDIA CUDA Deep Neural Network library (cuDNN). It is a “GPU-accelerated library of primitives for deep neural networks” as mentioned at NVDIA developer website.

The cuDNN library provides highly tuned applications for deep learning functions like forward and backward convolution, pooling, normalization, activation layers etc.

Theano

Theano in back end
Theano in back end

It is also a Python library especially for performing several mathematical operations like matrix manipulation and transformations and has tight integration with NumPy. Theano is developed by MILA lab of the University of Montreal, Quebec, Kanada. It is also capable to run on both CPU and GPU.

Theano is capable of creating symbolic graphs for calculating gradient descent.

CNTK

Keras with  Micosoft's Cognitive Tool  kit as back-end
Keras with Micosoft’s Cognitive Tool kit as back-end

This is Microsoft’s Cognitive Tool Kit formerly known as Computational Network Toolkit. It is a free tool kit which is easy to use but gives commercial grade performance in training deep learning algorithms.

Soon after google decided to provide support to Keras in TensorFlow’s core library in 2017; Microsoft also decided to include CNTK at the back end of Keras.

CNTK helps to “describe the neural network as a series of computational steps via a directed graph”

Out of these three backends of Keras implementation, TensorFlow is the most frequently used one because of its robustness. It is highly scalable too.

A comparison between the frameworks

To conclude this article I will summarize the features of Keras and where it is different from the other popular frameworks. The popular three frameworks TensorFlow, Keras and theano are compared in the table below.

We already know a basic difference between them. In deep learning execution point of view, Keras is the implementation interface whereas the rest of them act as the back end. But there are some other key differences between them. Let’s discuss them point by point

PropertyKerasTensorFlowTheanoCNTK
LicenceMIT licenceApache 2-DBSDMIT
Developed byFrancois CholletGoogle BrainUniversity of Montreal, Quebec, KanadaMicrosoft Research
Release2015201520072016
Written inPython C++, CUDA, PythonPythonC++
PlatformsLinux, MacOS, WindowsLinux, MacOS, Windows, AndroidCross platformLinux, MacOS, Windows
APIHigh level APILow-level APILow-level APILow-level API
UseUsed in experimentation with deep learningPopularly used for all kinds of machine learningUsed for multidimensional matrix operationsCUDA support and parallel execution feature
Deep learning typeUsed for all deep learning algorithmsSupports reinforcement learning and other algorithmsNot very smooth performer in AWSA deprecated deep learning framework
Open MP supportYes (with Theano as back end)YesYesYes
Open CL supportYes (With Theano, TensorFlow and PaidML as back end)YesNoy yet, work in progressNo
Support for reinforcement learningNoYesNo No
PerformanceQuite fast with Theano and TensorFlow in back endOptimized for big models, memory requirement may be higher.Run time and memory competitive and support for multiple GPU. Compile time is more. Comparable to Theano and TensorFlow
Additional featuresSupport for Android,
Multi GPU support
Android support, Multi GPU support, support for distributed windows OSMulti GPU supportCross platform support
Difference between Deep Learning frameworks

Conclusion

This is all four frameworks in a glimpse. Better not take this table to compare between these frameworks as they are not working independently, rather play together to make up for other’s weaknesses. In this way, we got a really powerful Deep Learning execution engine.

So, I hope this article has been able to give you a very introductory idea about Keras as a deep learning framework. This information is just to make you feel comfortable before you take the first step.

As you enter real deep into the deep learning applications, you will automatically find out more interesting facts as well as points presented here will make more sense to you.

If you find this article helpful please let me know through comments below. In case anything I have missed here or there is any question or doubt, any suggestions please put them also in comments. I would like to answer them.

A detailed discussion on tensors, why it is so important in deep learning?

A detailed discussion on tensors

This article is all about the basic data structure of deep learning called Tensors. All inputs, outputs and transformations in deep learning are represented through tensors only. Depending on the complexity of the data tensors with different dimensions play the role of the data container.

So, it goes without saying that to improve deep learning skill if you must be confident in your knowledge of tensors. You should be fluent with its different properties and mathematical treatment. This article will help you to get introduced to tensors. As you finish this article, you will be thorough with the following topics:

  • What are tensors
  • Properties of tensors like dimension, rank, shape etc.
  • Use of tensors in deep learning
  • Real-life examples of tensor application

The importance of tensors can be understood by the fact that Google has created a complete machine learning library namely Tensorflow on tensors. So, in this article, I will try to clear the basic idea about tensor, different types of tensors, their application with executable python code.

Tensors with different dimensions
Tensors with different dimensions

I will also try to keep it as simple as possible. The mathematical parts will also be presented with the help of python scripts. As it will be much easier to understand for those with no or little mathematical background. Some basic knowledge of matrics will certainly be beneficial for quick learning.

So let’s start the article with the most obvious question;

What is a Tensor?

Tensor is nothing but a container of data. It works the same as matrics do for NumPy. In tensor terms, a matrix is a two dimensional (2-D) tensor. In a similar way, a vector is a one-dimensional tensor whereas a scalar is a zero-dimensional tensor.

When we deal with an image, then it has three dimensions like height, weight and depth. So a 3-D tensor is required to store an image. Likewise, when there is a collection of images, another dimension of no. of images gets added. So, now we will need a container with four dimensions. A 4-D tensor will serve the purpose. To store videos 5-D tensors are used.

Generally in neural networks, we need to use tensors up to four dimensions. But it can go up to any dimensions depending on the complexity of the data. The NumPy matrices can be thought of a general form of tensors with any arbitrary dimensions.

Scalar data

These are tensors with zero dimension. Data types like float 32, float 64 are all scalar data. These scalar data has rank zero as they have zero axes.  Python’s ndim attribute can display the number of axes of any data structure. See the following code applied to a scalar data structure.

Scalar data as tensors
Scalar data as tensors

You can try these simple codes and check the results. If you are just getting familiar to python compiler it can be a good start.

Vector data

These are one-dimensional (1-D) tensors. So the rank is one. It’s often confusing differentiating between an n-Dimensional vector with n-Dimensional tensor. So for example if we consider the following vector

\dpi{200} \begin{bmatrix} 1,6,7,9,10,8 \end{bmatrix}

It is a six dimension vector with one axis, not a 6-D tensor. A 6-D tensor will have 6 axes with any number of dimensions along each of the axes. 

A vector: 1-D tensor
A vector: 1-D tensor

Matrices

These are 2-D tensors with two axes. A matrix has rows and columns hence two axes and rank is two. Again we can check this with the ndim attribute. Let’s take a NumPy matrix of size (3,4) which means the matrix has 3 rows and 4 columns.

\dpi{200} \begin{bmatrix} 2,3,3,9 \\ 4,10,5,8\\ 4,6,9,2 \end{bmatrix}

So, lets check its rank in the same way as we did in case of scalar and vector:

A matrix: 2-D tensor
A matrix: 2-D tensor

While you are writing the codes, be extra cautious with matrix input. Often the braces open and closing cause errors.

Tensors with higher dimensions

As I have mentioned at the beginning, tensors commonly we use have dimensions up to four. But in case of video data the dimensions can go up to five. We can easily understand data structures of dimensions up to two. But when it goes beyond that, it becomes a little difficult to visualize them. 

3D tensors

In this section we will discuss some high dimensional tensors and the way they store data. So, let’s start with 3-D tensors. Let’s consider the following tensor and try to identify its three dimensions.

A matrix with 3 axes
A matrix with 3 axes

You can see it is actually a data structure containing three matrices each with 3 rows and 4 columns. See this image to understand the shape of this tensor. Let’s create a variable to store the data and check its rank with ndim attribute.

# High dimensional tensors
x=np.array([[[2,5,6,9],
             [3,9,0,1],
             [2,8,9,1]],
            [[12,5,16,9],
             [4,6,0,1],
             [2,5,9,8]],
            [[1,0,6,2],
              [8,10,0,5],
             [13,3,6,1]]])
print(x)
print(x.ndim)

See the output below to understand the structure of a 3-D tensor. It is a collection of matrices. Thus unlike a single matrix with two axes a 3-D tensor has three axes.

3-D tensor
3-D tensor

4-D tensors

The same way we get a 3-D tensor, if some of such 3-D tensors are to be grouped then another dimension gets created making the tensor a 4-D tensor. See the image for a hypothetical 4-D tensor. Here you can see three cubes are clubbed. Such 4-D tensors are very useful for storing images for image recognition in deep learning.

In the same fashion we can have more higher dimension tensors. Though tensors up to 4 dimension are more common, some times to sore videos 5-D tensors are also used. Theoretically there is no such limitation in dimension. For the sake of data storage in an organized manner any n number of dimensions can be used.

5-D tensors

This is the type of tensors when we need to store data with yet another dimension. Video data can be an ideal example where 5-D tensors are used.

If we take an example of a 5-minute video of 1080 HD resolution, then what will be the dimension of its data structure? Let’s calculate in a simple way. The pixel size will be 1080 x 1920 pixels. The time duration of the video in seconds is 5 x 60=300 seconds.

Now if the video is sampled by 10 frames/second then a total number of frames will be 300 x 10=3000. Suppose the colour depth of the video is 3. So for this video, the tensor should have 4 dimensions, the shape is (3000, 1080,1920,3).

So, the single video clip is a 4-D tensor. Now if we want to store multiple videos, say 10 video clips with 1080 HD resolution, then we need 5-D tensors. The shape of this 5-D tensor will be (3000, 1080,1920,3,10).

This is an enormous size of the video content. If we want to use such a huge data directly in deep learning, the training process will be never ending. So, such kind of data needs size reduction and several preprocessing steps before using as input in neural network.

Shape of tensors

 This is a concept to mention a number for the length of each axis of a tensor. These are a tuple of integers indicating the length of each dimension.

A vector has only one axis/dimension so the shape is just a single element. The vector we used here as an example has 6 elements so its shape is (6,).

The matrix as 2-D tensor we discussed above has the shape (3,4). As it consists of 3 rows and 4 columns.

Likewise in case of a 3-D tensor, the shape tuple will contain the length of all its three axes. For the example we took here has shape (3,3,4). See the image below to visualize it.

Shape of a 3-D tensor
Shape of a 3-D tensor

Again the 4-D tensor we took as an example above has a shape (3,3,7,4) as it groups three separate cubes together. The image below presents a higher dimension figure to understand its dimensions and shape.

Shape of a 4-D tensor
Shape of a 4-D tensor

Real life examples of tensors as data container

So, as of now, I think the basics of tensor is clear to you. You know as a data structure how a tensor stores the data. We took small examples of commonly used data structures as tensors.

But in general, the tensors used for real-life problem solving are much more complex. The deep learning used for image recognition often deals with thousands of images stored in the database. So, which data structure should we use to handle such complex data? Here comes tensors to rescue us.

The MNIST data set with handwritten digits

Let’s take a real-life example of such image database. We will take the same MNIST data we used for handwritten digit recognition in an earlier blog post. It is an image database storing 60000 images of handwritten images. And it is effectively stored in a 3-D tensor with shape (sample_size, height, weight).

Let’s load the database. It is a default database in the Keras library.

#Loading the MNIST data set 
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

The python codes we applied before in this article can be applied here too to check the number of axes of the training data set.

# Checking the axes of train_images
print(train_images.ndim)

And the above line of code will return the rank as 3. Again we will check the shape of the data structure storing the whole database.

# Checking the shape of the tensor
print(train_images.shape)

The shape it will return is (60000,28,28). Which suggests that it has 60000 images each with size 28×28 pixels.

Lets check an image from the data set. As I have mentioned the data set contains hand written image of digits and it is a classic example data set for feature recognition. Although it is essentially a solved data set, still many deep learning enthusiasts use it to test their new model’s efficiency.

So here is the code for printing the 10th digit from this data set. We will use the pyplot module from matplotlib library.

# Printing the 10th image from the MNIST data set
import matplotlib.pyplot as plt
sample_image=train_images[10]
plt.imshow(sample_image)

The output of the above code will be 10th image of a handwritten digit. See below if you recognize it 🙂

Sample from MNIST data set
Sample from MNIST data set

Stock price data

In the Indian stock market the price of each stock changes every minute. A particular stock’s high, low and final stock price for each minute of a trading day is very important data for the traders.

See, for example, the candlestick chart of a particular stock on a particular trading day. The chart shows the stock price behaviour from 10:00 AM to 3:00 PM. That means a total of 5 hours of the trading day i.e. 5 x 60=300 minutes.

So the dimension of this data structure will be (300,3). That makes it a 2-D tensor. This tensor stores a stock’s high, low and final price for a particular day.

Candlestick chart of stock prices
Candle stick chart of stock prices

Now if we want to store the stock’s price for the whole week? Then the dimension will become (trading_week, minutes, stock_price); if on that week there are 5 trading days, then (5,300,3). That makes it 3-D tensor.

Again if we want to store a number of stocks price for a particular week? say for 10 different stocks? So another dimension gets added. It becomes a 4-D tensor with shape (trading_week, minutes, stock_price, stocks_number) i.e. (5,300,3,10).

Now think of mutual funds, which are the collection of stocks. So if we consider someone’s mutual fund portfolio having different mutual funds, then to store the high, low and final price of all the stocks of that portfolio for a whole trade week we will need a 5-D tensor. The shape of such tensor will be (trading_week, minutes, stock_price, stocks_number, mutual_funds).

Conclusion

So, tensor and its different properties are now clear to you. I know there are some new terms and for the first time, these may appear a little confusing. So here in the below table, I have once again summarised them for a quick revision.

Type of tensorUsesRank/axesShape
0-D TensorStoring single value0Single element
1-D TensorStoring vector data1Length of an array
2-D TensorStoring data in matrices2Rows, columns/samples, features
3-D TensorTime series, single image3Width, height, colour depth (in case of an image)/ samples, time lags, features (in case of time series)
4-D TensorStoring Images4Width, height, colour depth, no. of images/ samples, channels, height, width
5-D TensorStoring videos5Sample, frame, height, width, channels
Different types of tensors

Please also refer to the articles mentioned in the reference for further reading. These are also very informative articles and you can brush up your knowledge.

Hope that you have found the article helpful. If you have any questions or doubts regarding the topic, please put it in comments below. I would like to answer them.

Follow this blog for forthcoming articles where I am going to discuss more advanced topics on tensors and deep learning in general.

Also if you liked the post, subscribe it so that you can get the notifications whenever new blogs are added.

Reference

How to develop a deep learning model for handwritten digit recognition?

Developing deep learning model for handwritten digit recognition

This article describes how to develop a basic deep learning neural network model for handwritten digit recognition. I will take a very basic example to demonstrate the potential of a deep learning model. Though this example you will get some elementary idea about the following key points:

  • How deep learning performs the task of image recognition,
  • The process of building a neural network,
  • Basic concepts of neural network layers
  • Testing and validation its performance and
  • Creating some diagnostics for its evaluation.

Even if you have no prior exposure to programming or statistical/mathematical concepts, you will not face any problem understanding the article.

Deep learning model has its most prominent use in feature recognition. Especially in image recognition, voice recognition and lots of other fields. In our daily lives, we frequently see use of biometric identification of individuals, pattern recognition in smartphones, fingerprint scanner, voice-assisted search option in digital devices, chatbots answering your questions without any human intervention, weather prediction tools etc. etc.

So, it is a very pertinent question to ask “how can we develop a deep learning model which can perform a very simple image recognition task?”. So let’s explore it here.

Develop a deep learning model

All of the applications mentioned above make use of deep learning. It learns from the historic labelled data. The model uses this labelled data to learn the pattern and then predict a result for a new feature as accurately as possible. So, it is a learning process and it’s deep because of the layers in it. We will see here a deep learning model learns the pattern to match handwritten digits.

I will discuss every section with its purpose. And hope that you will find it interesting how deep learning works. I have chosen this example because of its popularity. You will find many book chapters and blogs on it too.

I have mentioned all such sources I referred to write the python code to develop deep learning model for handwritten digit recognition at the end of this article in “reference”. You can refer them to further enrich your knowledge. All of them are a very good source of information.

As you finish reading the article you will gain the basic knowledge of a neural network, how it works and the basic components of it. I will be using a popular Modified National Institute of Standards and Technology data set or in short MNIST data. 

The MNIST data

It is an image data set of handwritten digits between 0 to 9. The greyscale images are with the resolution of 28×28 pixels; 60000 images for training and 10000 images for testing. This is a very common data set used to test any machine learning model.

In this article, we will build the model writing python code. More importantly, discuss the particulars and components of a neural network model. So that in the process of building the model you got a clear understanding of each step and develop confidence for its further application.

So let’s start coding:

Calling the basic libraries to develop the deep learning model

These are the very basic python libraries required for different tasks.

# Importing required basic libraries
from numpy import mean
from matplotlib import pyplot
from sklearn.model_selection import KFold
from numpy import std
from keras import models
from keras import layers

Loading the data

The MNIST data is a default data set in the Keras library. So, you only need to load it with the following code. Four NumPy arrays are there to store the data. You can check the array types to confirm.

#Loading the data set (MNIST data is pre included in Keras)
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

The following line of codes will produce a sample set of images from the MNIST data set.

# Example of first few images of MNIST data set
for i in range(9):
	# define subplot
	pyplot.subplot(330 + 1 + i)
	# plot raw pixel data
	pyplot.imshow(train_images[i], cmap=pyplot.get_cmap('gray'))
# show the figure
pyplot.show()

See the grey-scale images below to get an idea of the handwritten digit images.

First few handwritten images from MNIST data set
First few handwritten images from MNIST data set

The training and testing data set

The following code describes the training data set. It consists 60000 sample data from the whole MNIST data set. The images are of 28×28 pixel size. And the particular sample rows comprising the training set.

#The training data set
print("Training data shape: ",train_images.shape)
print("Size of training data:",len(train_labels))
print("Train labels:",train_labels)
Description of training data
Description of training data

In the same way the following code presents the description of testing data set. Which has 10000 sample data.

#The testing data set
print("Testing data shape: ",test_images.shape)
print("Size of test data:",len(test_labels))
print("Test labels:",test_labels)
Description of test data
Description of test data

Building the model

In this deep learning neural network model notice the specification of layers. These layers are the main component in any neural net model. The neural network performs data distillation process through these layers. The layers act as sieves to refine the output of the neural network.

The first layer receives the raw inputs and passes them to the next layer for higher-order feature recognition. The next layer does the same with more refinement to identify the more complex feature. Unlike machine learning, deep learning is capable of identifying which feature to identify in each step.

In this process gradually the model output matches closely with the desired output. The feature extraction and recognition process continue with several iterations until the model’s performance is satisfactory.

network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))

Dense layers

The neural net here consists of dense layers. Which means the layers are fully connected. There can be many layers depending on the complexity of the identifiable feature. In this case, only two such layers have been used.

The first layer has an input size 512. The input shape is 28*28 which is actually the pixel size of each grey-scale image. The second layer, on the other hand, is a 10-way softmax layer. Which means that the layer has an array of 10 probability score totalling equal to 1. Each of these probability corresponds to the probability of each handwritten image belonging to any of the 10 digits from 0 to 9.

Data preprocessing

A neural network expects data values with an interval [0,1]. So, the data here needs some preprocessing before being used by the deep learning model. It can be done by dividing each value by the maximum value of the variable. But to do that we need to convert the data type from unsigned integers to floats.

The transformation used here actually converts the data type from Unit8 to float32. In this process the data value range changes from [0,255] to [0,1].

# Reshaping the image data to bring it in the interval [0,1]
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

Compilation of the model

The compilation of the model comprises some critical terms like “loss”, “metrics”, “optimizer”. They have special function while compiling the model. Here the loss function used is categorical crossentropy. Its a function to estimate the error model produces and calculate the loss score. This function is best suited for multi-class classification like in this case with 10 digit classes.

Depending on this loss score, the optimizer function optimizes the weight of the layers. Its kind of parameter adjustment of the model. This process goes on until the model achieves an acceptable level of estimation.

#Compilation step
network.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])

The metrics the model uses here is “accuracy”. It suggests the goodness of fit of the model. Quite obviously, lesser the value of accuracy, better is the model.

# Encoding the labels into categories
from keras.utils import to_categorical
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

Model fitting

Now its time to fit the model with the training data. For fitting the model the number of epochs is 5 and batch size is 128. Which means the process repeats itself 5 times with 128 samples in each cycle.

# Fitting the model
network.fit(train_images, train_labels, epochs=5, batch_size=128)

Now in the output, we can see two metrics loss and accuracy. These two parameters tell us how the model is performing on the training data. From the last epoch, we can see that the final accuracy and loss the model achieves are 0.9891 and 0.0363 respectively. Which is quite impressive.

Model fitting
Model fitting

Testing the model

We achieved a training accuracy for the model as 0.9891. Let’s see how the model performs with the testing data.

# Testing the model with testing data
test_loss, test_acc = network.evaluate(test_images, test_labels)
print('test_acc:', test_acc)

The test accuracy is 0.9783. Though it’s quite good although there is a drop in the accuracy from training. When the model’s performance is better in training and the accuracy is less while testing, it may be an indication of the model’s overfitting.

Accuracy of the test
Accuracy of the test

Evaluating the model

Now we know how to develop a deep learning model but how can we evaluate it? This step is as important as building the model.

We are here using the MNIST data as a practice example for the application of the network model. So, although the MNIST data is an old data set and effectively solved, we will like to evaluate our model’s performance with this data. In order to do that we will apply the K-fold cross-validation process.

In this evaluation process, the data set is divided into k-groups. Then the model fit process is repeated k times for each of the groups. Thus the name k-fold cross-validation. Except for the data in the particular group, the rest of the data is training data and the group data is used to test the model.

To perform this evaluation task we will develop three separate functions. One function for performing the k-fold cross-validation, one is to make some plots showing the learning pattern of all the validation steps and another function for displaying the accuracy scores.

Finally all these three functions will be called in the evaluation function to run each of them. The neural network model developed above has been used in this evaluation process.

Writing the function for k-fold cross validation

This is the first function for performing the validation process. We take 5 fold validation here as lesser than this number will not be enough to validate and again too big a number will take long time to complete, so it is a kind of trade of between the two.

The kfold module from scikit-learn library will automatically randomize the sample values in different groups and split them to form test and validate data.

# k-fold cross-validation
def evaluate_nn(X, Y, n_folds=5):
	scores, histories = list(), list()
	kfold = KFold(n_folds, shuffle=True, random_state=1)
	# data solitting
	for train_ix, test_ix in kfold.split(X):
		# train and test
 data
		train_images, train_labels, test_images, test_labels = X[train_ix], Y[train_ix], X[test_ix], Y[test_ix]
		# fit model
		history = network.fit(train_images, train_labels, epochs=5, batch_size=128, validation_data=(test_images, test_labels), verbose=0)
		# model evaluation
		_, acc = network.evaluate(test_images, test_labels, verbose=0)
		print('> %.3f' % (acc * 100.0))
		# scores
		scores.append(acc)
		histories.append(history)
	return scores, histories

Creating plots for learning behaviour

Here we create functions to plot the learning pattern through the evaluation steps. Two seperate plots will be there. One for the loss and the other for accuracy.

def summary_result(histories):
  for i in range(len(histories)):
    # creating plot for crossentropy loss
    pyplot.subplot(2, 2, 1)
    pyplot.title('categorical_crossentropy Loss')
    pyplot.plot(histories[i].history['loss'], color='red', label='train')
    pyplot.plot(histories[i].history['val_loss'], color='green', label='test')
    # creating plot for model accuracy
    pyplot.subplot(2, 2, 2)
    pyplot.title('Model accuacy')
    pyplot.plot(histories[i].history['accuracy'], color='red', label='train')
    pyplot.plot(histories[i].history['val_accuracy'], color='green', label='test')
  pyplot.show()

Printing accuracy scores

It is for printing the accuracy scores. We will also calculate the summary statistics like mean and standard deviation of the accuracy scores. And finally will plot a Box-Whisker plot to show the distribution.

def acc_scores(scores):
	# printing accuracy
 and its summary
	print('Accuracy: mean=%.3f std=%.3f, n=%d' % (mean(scores)*100, std(scores)*100, len(scores)))
	# Box-Whisker plot
	pyplot.boxplot(scores)
	pyplot.show()

Finally the evaluation function

This is the final evaluation module where we will use all the above functions to get the overall evaluation results.

def evaluation_result():
	# model evaluation
	scores, histories = evaluate_nn(train_images, train_labels)
	# train and test curves
	summary_result(histories)
	# model accuracy scores
	acc_scores(scores)
 # run the function
evaluation_result()

Here 5-fold cross-validation has been used. As the whole data set contains 60000 sample data so each of these k groups will have 12000 sample data.

Evaluation results

See below the accuracy for all the 5-fold evaluation steps. For the first step the accuracy is 98% and for all the rest four occasions the accuracy scores are above 99%. Which is quite satisfactory.

Accuracy score for k-fold cross validation
Accuracy score for k-fold cross validation

We have also produced some diagnostic curves to display the learning efficiencies in all five folds. The following plots show the learning behaviour by plotting the loss and accuracy of the training processes in each fold. The red line represents the loss/accuracy of the model with the training data set whereas the green line represents the test result.

You can see that except for one case the test and training lines have converged.

Plots to show learning behaviour through cross validation steps
Plots to show learning behaviour through cross validation steps

The below Box-Whisker plot represents the distribution of the accuracy mean across all folds for ease of understanding.

Distribution of the accuracy score over all the  evaluation steps
Distribution of the accuracy score over all the evaluation steps

The mean and standard deviation of all the accuracy score are as below:

Mean and standard deviation of accuracy
Mean and standard deviation of accuracy

Conclusion

So, now you know how to develop a very basic deep learning model for image recognition. Here you have just started with deep learning models and many things may be new to you especially if you don’t have any programming background. But not to worry with more practice things will improve.

You can reproduce the result using the code from here. But it is completely okay if you don’t understand all the steps of the coding part. Take it as a fun activity of python programming as well as deep learning. 

A very necessary step for learning neural network or in general any new thing to get yourself emerge in that field and arouse deep interest in it.  I hope that this blog will serve this purpose. This is not to overwhelm you with the complexity of the technique rather demonstrate you about its potential in the simplest and the most practical way.

So, what are you waiting for? Start writing your first deep learning program taking the help of this article. Make small changes and see how the output changes.

Don’t get frustrated if it gives an error (which will be very common while doing your first programming). Try to explore the reason for the error. Take help of the StackOverflow site to know about the error. You will get an answer there for almost all of your queries. And this is the most effective way to learn it. 

Please, don’t forget to comment below if you find the article useful. It will help me to plan more such articles.

References

  • https://www.wikipedia.org/
  • Chollet, F., 2018. Deep Learning mit Python und Keras: Das Praxis-Handbuch vom Entwickler der Keras-Bibliothek. MITP-Verlags GmbH & Co. KG.
  • https://www.wikipedia.org/
  • https://www.digitalocean.com/

Evolution of Deep Learning: a detailed discussion

Evolution of Deep learning

The evolution of deep learning has experienced many ups and downs since the last few decades. One time it rose to the pick of the popularity, expectations were high and suddenly some setbacks in experimental trials created a loss of confidence and disappointments. This article will cover this journey of deep learning neural network from its inception to its recent overwhelming popularity.

Background of Machine learning

This all started with very basic concepts of probabilistic modelling. These are very elementary statistical concepts from the school syllabus. This was the time even before the invention of the term machine learning. All models, functions were solely crafted by the human mind. 

Probabilistic models

These models are the first step towards the evolution of deep learning. These models are developed keeping in mind real-world problems. Variables having relationships between them. Combinations of dependent and independent variables were used as inputs to these functions. These models are based on extensive mathematical theory and more empirical than practical.

Some popular such probabilistic models are as below:

Naive-Bayes classification

It is basically the Bayes theorem with some naive assumption hence the name. The concept of this modelling was established long back during the 18th century. The assumption here is “all the features in the input data are all independent”. 

For example, suppose a data set has the data of some persons with or without diabetes disease and their corresponding sex, height and age. Now the Naive Bayes will assume that there is no correlation between all the features between sex, height and age and they contribute independently towards the disease. This assumption is called class conditional independence.

So how this assumption helps us to calculate the probability? Suppose, there is a hypothesis H which can be true or false and this hypothesis gets affected by an event e. We are interested to calculate the probability of the hypothesis being true given that the event is observed. So, we need to calculate: P(H|e)

According to Naive Bayes’ theorem

Here, P(H|e) is called the posterior probability of the hypothesis with the information of event e and can not be easily computed. So, we need to break down it as in equation 1. Now we can calculate each of the probabilities separately from the frequency table and calculate the posterior probability. 

You can read the whole process of calculation here.

P(H) is the prior probability of the hypothesis before observing the event.

Logistic regression

This regression modelling technique is so basic and popular for almost all classification problems that it can be considered as the “Hello World” of Machine Learning. Yes, you have read it right. It is a process for classification problems. Don’t let the word regression in the name misguide you. 

It is originally a regression process which becomes a classification process when the process involves a decision threshold for the prediction. Deciding a threshold for the classification process is very important and tricky one too.

We need to decide the decision threshold depending on the particular case in hand. There can be four types of responses in case of classification problems which are “true positive”, “true negative”, “false positive” and “false negative” (read details about them here). We have to fix the probability of one type of occurrence while reducing another depending on its severity.

Example and basic concept 

For example, take the case for a severe crime and it is to decide if the person should be hanged or not. It is a problem of binary classification with two outputs guilty or not guilty. Here the true positive case is the person found guilty when he actually has committed the crime. On the other hand, the true negative is the person found guilty when he has not committed the crime.

So, no doubt the true negative case here is of very serious type and should be avoided at any cost. Hence while fixing the decision threshold, you should try to reduce the probability of true negative while fixing the probability of true positive cases.

Unlike linear regression predicting the response of a continuous variable, in logistic regression, we predict the positive outcome of a binary response variable. Unlike linear regression which follows a linear function, a logistic regression has a sigmoid function.

The equation for logistic regression:

Equation for logistic regression

Initial stages of evolution of Deep Learning

Although the theoretical model of deep learning came in 1943 by Walter Pitts, a logician and Warren McCulloch, a neuroscientist. The model was called McCulloch-Pitts neurons and still regarded as a fundamental study on deep learning.

The first evidence of the use of neural networks in some toys for children made during the 1950s.  The same year the legendary mathematician Alan Turing proposed the concept of Machine Learning and even gave hints about the genetic algorithm in his famous paper “Computing machinery and intelligence”.

Alan Turing
Alan Turing (Image source: http://rutherfordjournal.org)

In 1952, Arthur Samuel first time coined the term Machine Learning. He is known as the father of machine learning. He with his association with IBM also developed the first machine learning programme.

The perceptron: the perceiving and recognizing automaton” a research paper published in the year 1957 by Frank Rosenblatt set the foundation of Deep Learning network.

In 1965 mathematician Alexey Ivakgnenko and V.G. Lapa arguably developed the first working deep learning network. Ivakgnenko for this contribution is considered as the father of deep learning by many.

The first winter period

The period between 1974-80 is considered as the first winter period. It is a long rough period faced by AI research. A critical report submitted by Professor Sir James Lighthill on AI research as asked by UK parliament played a major role to initiate this period.

The report was very critical about the AI research in the United Kingdom and was in the opinion that nothing has been done in the name of AI research. All expectations about AI and deep learning were all hype; creation of a robot was nothing but a mirage; such comments were very disappointing and resulted in the /retraction of research funding for most of the AI research.

Invention of Backpropagation algorithm

Then in during 1980, the famous Backpropagation algorithm with Stochastic Gradient Descent (SGD) was invented for training the neural network. This can be considered as path-breaking discovery as far as deep learning is concerned.  These algorithms are still the most popular among deep learning enthusiasts. This algorithm only led to the first successful application of Neural Network.

LeNet

Come in 1989 we got to see the first real-life application of Neural Net. It was Yann LeCun who made this possible through his tireless effort in Bell Labs to combine the ideas of Backpropagation and Convolutional neural network.

Yann LeCun
Yann LeCun

The network was named after LeCun as LeNet. It found its first real-world problem-solving use in identification of handwritten codes. It was so efficient in identifying the codes that United States Postal Service adopted this technology in 1990 for identifying the digits of ZIP codes on the mail envelopes. 

Yet another winter period; however brief one

In spite of the success achieved by LeNet, in the same year 1990, the advent of Support Vector Machine pushed the Neural Network almost extinction. It gained very fast popularity mainly because of its easy interpretability and state of the art performance. 

It was also a technology came out of from the famous Bell Labs. Vladimir Vapnik and Corinna Cortes pioneered its invention. They started working on it long back in 1963. It’s their continuing effort that resulted in the revolutionized Support Vector Machine of 1990.

Support Vector Machine: a new player in the field

This new modelling is mainly conceptualized on a kernel trick to calculate the decision boundary between two class of variables. Except for a few cases, it is very difficult to discriminate variables on a two-dimensional. It becomes far easier to understand in a higher-dimensional space. A hyperplane of higher dimensional space becomes a hyper line in two-dimensional space i.e. a straight line. This process of transforming the mode of representation is known as kernel trick. 

Below is an example of what I mean to say by higher dimension representation for classification.

SVM: data representation in higher dimension
SVM: data representation in higher dimension

In figure A, two classes of observations that are red and blue classes are classified using a hyperline. It is a straight forward case and the classification is easy. But consider the figure in B here a straight line can not classify the points.

As a new third axis has been introduced in figure C, we can see that the classes are now can be easily separated here. Now how it will look if the figure we again convert it to its two-dimensional version? see the figure in D.

So, a curved hyperline has now separated the classes very effectively. This is what a support vector machine does. It finds a hyperplane to classify the points and then any new point gets its class depending on which side of the hyperplane it resides.

Kernel trick

A kernel trick can be explained as a technique to maximize the margin between the hyperplane and the closest data points. It makes the process very easy by curtailing the need to calculate the new coordinates in the new representation space. The kernel function only calculates the distance between the pair of points. 

This kernel function is not something that SVM learns from the data. It is solely crafted by the human mind. The distance between the points in the original space to that of in the new representation space is mapped. And then the hyperplane is created through learning from the data. 

Pros of SVM

  • The process is very accurate for the limited amount of data and when data is scarce
  • It has a strong mathematical base and also in-depth mathematical analysis is possible in SVM
  • Interpretation is very easy
  • The popularity of this process was instant and unprecedented

It also sufferers from some weaknesses like:

  • Scalability is an issue. When the data set is vast it is not very suitable.
  • Modern-day databases with a huge amount of images with enormous information provided the recognition process is efficient. SVM is not the preferred candidate here.
  • It is a shallow method so feature engineering is not easy.

Decision tree

During 2000 another classification technique made its debau. And instantly became very popular. It even surpasses the popularity of SVM. Mainly because of its simplicity, ease of visualizing and interpretation it became so popular. It also uses an algorithm which consumes very limited resource. So, a low configuration of the computing system is not a constrain for the application of the decision tree. Its some other benefits are:

  • The decision tree has a great advantage of being capable of handling both numerical and categorical variables. Many other modelling techniques can handle only one kind of variable.
  • Requires no data processing which saves a lot of user’s time.
  • The assumptions are not too rigid and model can slightly deviate from them.
  • The decision tree model validation uses statistical tests and the reliability is easy to establish.
  • As it is a white box model, so the logic behind it is visible to us and we can easily interpret the result unlike the black-box model like an artificial neural network.

But it does suffer from some limitations. Like it has a problem of overfitting. Which means that the performance with training data does not reflect when an independent data set is used for prediction. It is quick to produce a result which is often lacking satisfactory accuracy.

However, since its inception in 2000, it continued its golden run till 2010.

Random forest

This technique came to improve the weaknesses of the decision tree. As decision tree was already popular for its simplicity. Random forest took no time to win the heart of all machine learning enthusiasts. 

As it overcomes the limitations of the Decision tree, it became the most practical and robust among the shallow ML algorithms. Random forest is actually ensembling of decision trees i.e. it is a collection of decision trees where each decision tree has trained with a different dataset. The more decision tree a random forest model includes, the more robust and accurate its result becomes. It is like as we consider a forest a robust one if it has many trees.

Random forest: ensemble of decision tree
Random forest: ensemble of decision tree

Random forest actually makes a final prediction from the prediction obtained from each of the decision tree models to overcome the weakness of a single decision tree model. In this sense, the random forest is a bagging type of ensemble technique. 

We can have an idea of Random forest’s popularity by the fact that in 2010 it became the most liked machine learning in the famous data science competition website Kaggle. 

The Gradient Boosting modelling was then the only other approach which came up as the closest competitor of random forest. This technique ensemble all other weak machine learning algorithms mainly decision tree. And it was quick to outperform random forest. 

In Kaggle very soon gradient boosting ensemble approach overtake random forest. And still, this technique is the most used machine learning method along with the deep learning technique in almost all Kaggle competitions.

Dark Knight rises: The neural network era starts

Although the neural network was not consistent in showing its potential since 1980. Its success when demonstrated by some researchers like from IBM etc. it surprised the whole world with intelligent machines like Deep Blue, Watson etc. 

The dedicated deep learning scientists putting their hard work in research never had any doubt about its potential and what it is capable of to do. The only constrain till then the research work was in very scattered form. 

A coordinated research effort was very much required to establish its potential beyond any doubt. The year 2010 marked the dawn of a new era when for the first time such effort was initiated by Yann LeCun of  New York University, Yoshua Bengio of the University of Montreal, Geoffrey Hinton and his group of University of Toronto and IDSIA in Switzerland.

From the group of researchers, Dan Ciresan of IDSIA first showed the world some successful applications of modern deep learning in 2011. Using his developed GPU trained deep learning network, he won some of the prestigious academic image classification competitions.  

The ImageNet

ImageNet image classification competition conceptualized by Geofrey Hinton and his group from the University of Toronto started a significant chapter in the history of Deep Learning Neural Net in the year 2012.  

Screenshot of ImageNet
Screenshot of ImageNet (http://www.image-net.org/)

In the same year, a team headed by Alex Krizhevsky and guided by Geoffrey Hinton recorded an accuracy of 83.6% in this image classification challenge. Which was on quite a hire side compare to the accuracy of 74.3% achieved by computer vision using classical approaches in the year 2011. 

The ImageNet challenge was considered to be solved when someone with a deep convolutional network (convnets) improved the image classification accuracy up to 96.4%. Since then it was the deep convolutional neural net that had always dominated the machine learning domain.

The deep convolutional neural net got recognition by the whole world after its overwhelming success. Since then all major computer conferences and programmers meet almost all machine learning solutions are based on the deep convolutional neural net.

In some other fields like natural language processing, speech recognition also the deep convolutional neural net is a dominant technology replacing other previous tools like decision tree, SVM, random forests etc. 

A good example of major players switching to deep convolutional neural net from other technologies are like the European Organization for Nuclear Research, CERN the largest particle physics laboratory in the world has ultimately switched to deep convolutional neural net to identify new particles generated from Large Hadron Collider (LHC); earlier they were using decision tree-based machine learning methods for this task.

Conclusion

The article presents a detailed history of how deep learning has made a long way to reach today’s popularity and use in many fields across different scientific disciplines. It was a journey with many peaks and valleys which started way back in 1980. 

Different empirical statistical methods and machine learning algorithms preceding to deep learning made way for deep learning techniques mainly because of its high accuracy with a large amount of data. 

It registered many successes and then suddenly lost in despair for not being able to meet the high expectation. It always has a true potential being more a practical technique than empirical. 

Now the question is what is there in future of deep learning? What new surprises are in stock? The answer is really tough. The history we discussed here is evidence that many of them are already here to revolutionize our life.

So the next major breakthrough may also be just around the corner or it may take still years. But the field is always evolving and full of promises of blending machines with true intelligence. After all it learns from data so it will not repeat history of failures.

References

  • http://www.image-net.org/
  • Chollet, F., 2018. Deep Learning mit Python und Keras: Das Praxis-Handbuch vom Entwickler der Keras-Bibliothek. MITP-Verlags GmbH & Co. KG.
  • https://www.wikipedia.org/
  • Cortes, C. and Vapnik, V., 1995. Support-vector networks. Machine learning, 20(3), pp.273-297.
  • https://www.import.io
  • Vapnik, V., 1995. Support-vector networks. Machine learning, 20, pp.273-297.
  • Schölkopf, B., Burges, C. and Vapnik, V., 1996, July. Incorporating invariances in support vector learning machines. In International Conference on Artificial Neural Networks (pp. 47-52). Springer, Berlin, Heidelberg.
  • Rosenblatt, F., 1957. The perceptron, a perceiving and recognizing automaton Project Para. Cornell Aeronautical Laboratory.

Deep learning training process: basic concept

Deep learning training

In this article, we will discuss how deep learning training is conducted for problems like speech recognition, image recognition etc. You will have a basic idea about the training algorithm and how it adjusts the weight to reduce the error. A brief discussion will follow on different components of the training process of a deep learning algorithm.

Deep learning actually a very old concept of Machine Learning but it took a long time to gain popularity. Around 2010 it came to prominence and found many uses in fields like near human-level skills in image recognition, speech recognition, improved machine translation, digital assistants like Siri in apple, google know in android applications, Amazon’s Alexa etc. 

In case you have sufficient data to train the neural network and high capacity GPU then deep learning can be a good choice for its high accuracy. Higher GPU capacity enhances the model performance. 

For problems like speech recognition, the data volume is naturally lesser than image recognition. Data transfer from CPU to GPU plays a significant factor in determining learning efficiency in such smaller problems like speech recognition. But data parameter reduction can not be a solution for problems involving high volume data like image recognition.

Example of Deep learning training

See the following two variables. If we carefully analyse, we can identify that they are related and the relationship between them is Y=2X-1

Human intelligence can identify it by some trial and error methods as long as it is simple like the present one. Deep learning also follows the same principle for identifying the relationship. This process is called learning.

It applies some random values called weights at first to get the output. Initially, the output is much different than the expected one. So, the learning process goes on adjusting the weights and minimize the difference between the estimated and expected output. Thus ultimately it provides an accurate result.

Now if we use these variable combinations in a deep learning network and predict the value of Y corresponding to a value of X=6. It will predict Y= 10.99 instead of 11. As this prediction only on the basis of this sample so, the algorithm is not 100% certain if the prediction is correct.

Role of layers in deep learning training

Deep learning can have tens and thousands of layers theoretically. But when it has a lesser number of layers i.e. only 2-3 layers that learning method often referred to as shallow learning. Let’s consider a deep learning layer has n number of layers. Then if we try to use it for pattern recognition then the consecutive layers will try to identify special features of the pattern. 

The more advance the layer, the more advance feature it will identify. In this fashion, after several rounds of optimization of weights, the final layers will recognize the actual pattern. See the below schematic diagram to understand the process.

Deep learning training for image recognition
Deep learning training for image recognition

Here what we want the deep learning network to find the digit 7. As you can see we have used n numbers of hidden layers for feature extraction and to identify the character accurately. That’s why this learning process is often referred to as the “multi-stage information distillation process”.

Each hidden layer and input have got some weights, which act as parameters of the layers. After each iteration, we calculate the errors. In the next iteration, the weights again get adjusted to improve the performance further by reducing the error. 

This difference between the estimate and expected output is calculated through a loss function. This loss function in a way represents the goodness of fit of the model using the loss score and help to optimize the weights. See the following diagram to understand the process.

Deep learning training cycle
Deep learning training cycle

Some key terms used here are like:

Loss function

This is also known as cost function or objective function and used to calculate the deviation of the estimate from the true value. The probabilistic framework used for this function is maximum likelihood. In the case of classification problem, the loss function is a cross-entropy whereas in case of regression problems the Meas Squared Error (MSE) is generally used.

Weight adjustment

The training process starts with some random values of the weights. Calculated the error with these weights and in next cycles again adjust the weights to further reduce the error. This process continues unless a model with satisfactory performance is achieved.

Batch size

If the problem data set is of large size then for training the model small batches of example data is generally used. Such example data with small batch sizes are very handy for an effective estimation of the error gradient. If the complete data set is not very large, then the batch size can be the whole data set.

Learning rate

The rate at which the weights of the layers are adjusted is called the learning rate. The derivative of the error is used for this purpose.

Epochs

This term indicates the no of cycles of weight adjustment needed to achieve a good enough model with satisfactory accuracy. We need to mention the no of epochs we want the training process should go through beforehand while defining the model.

Such a model has the ability to generalize, which means the model trained in such a fashion that the model performs equally well with an independent data set as it has done during the training. 

So again I would like to mention that the basic idea of deep learning is very simple and rather empiric than theoretical. But this simple training process when scaled sufficiently can appear like magic. 

Backpropagation with Stochastic Gradient Descent

The above figure represents one cycle of the training process through which the weights are adjusted for the next cycle of training. This particular algorithm for training the deep learning network is called the Backpropagation algorithm due to its property of using feedback signal to adjust the weights.

The backpropagation algorithm performs weight optimization through an algorithm known as Stochastic Gradient Descent (SGD). This is the most common algorithm found in almost all neural networks for the optimization process.

Nearly all deep learning is powered by one very important algorithm: Stochastic Gradient Descent (SGD)

Deep Learning, 2016

The iteration of the training process only when a good enough model is found or the model fails to improve or stuck somewhere. Such a training process if often very challenging, time taking and involves complex computations. 

The problem with non-convex optimization

Unlike other Machine Learning or regression modelling process, the deep learning training involves non-convex optimization surface. Where in other modelling processes the error has a space shaped like a bowl with a unique solution

But when the error space is a non-convex one like in case of neural networks, there is neither a unique solution nor any guarantee of global convergence. The error space here comprises of many peaks and valleys with many really good solutions and also with some spurious good estimates of the parameters.

Different steps of training and selecting the best model

A deep learning model performance can be drastically improved if it is trained well. Also scaling up the number of training example and model parameters also play an important role in improving the model fit.  Now we will discuss different steps of training a model.

Cleaning and filtering the data

It is a very important step before you jump to train your model. A data set properly cleaned and filtered is even more important than a fancier algorithm. A data set not cleaned properly can result in a misleading conclusion. 

You must be aware of the phrase “garbage in, garbage out” popularly known as GIGO in software field; which means that wrong or poor quality input will result in faulty output. So, proper data processing is of utmost importance to work a model effectively.

Data set splitting

A model when built, it needs to be tested with an independent data set, which has not been used in model training. If we don’t have such an independent data set, then we need to split the original data into two parts. One is a training data set (generally 70-80% of the total data) and the remaining part of the data as test data.

Train and test data splitting
Train and test data splitting

Tuning the model

Tuning of the model mainly comprises of estimation of two main parameters of any model viz model parameter and Hyperparameter.

Model parameters

Model parameters are those which define an individual model. These parameres are calculated from the training data itself. For example, the regression coefficients are the model parameters which are calculated from the data set on which the model is trained.

Hyperparameter

These parameters are related to the higher structure of the algorithm and decided before the training process starts. Examples of such parameters are like the no of trees in a random forest, in case of regularized regression strength of the penalty used.

Cross-validation score

This is a model performance metric which helps us to tune the model by calculating a reliable estimate of model performance using the training data. The process is simple. We generally divide the data into 10 groups, use the 9 groups out of these to train the model and the rest one to validate the result. 

10-fold cross validation process
10-fold cross validation process

And this process will be repeated 10 times with different combinations of train and validation sets of data. That’s why it is generally called 10-fold cross-validation. On completion of all these 10 rounds, the performance of the model is determined by averaging the scores.

Selecting the best model

To select the best performing model, we take the help of a few model comparison metrics like Mean Squared Error (MSE) and Mean Absolute Error (MAE) for a regression problem. The lower the values of MSE and MAE the better is the model. 

Mean Absolute Error(MAE)

If y is the response variable and yis the estimate then MAE is the error between these npair of variables and calculated with this equation:

MAE is a scale-dependent metric that means it has the same unit as the original variables. So, this is not a very reliable statistic when comparing models applied to different series with different units. It measures the mean of the absolute error between the true and estimated values of the same variable. 

Mean Square Error (MSE)

This metric of model comparison is as the name suggests calculate the mean of the squares of the error between true and estimated values. So, the equation is as below:

Whereas in the case of classification problems the common metric that used is Receiver Operating Characteristic (ROC) curve. It is a very important tool to diagnose the performance of MLAs by plotting the true positive rates against the false-positive rates at different threshold levels. The area under ROC curve often called AUROC and it is also a good measure of the predictability of the machine learning algorithms. A higher AUC is an indication of more accurate prediction

Conclusion

At last, a word of caution is to use deep learning wisely for your problem, as it is not suitable for many real-world problems especially when the available data size is not big enough. In fact, deep learning is not the most preferred Machine Learning method used in the industry.

If you are new in the field of Machine Learning, then it is generally a very enticing application of deep learning blindly for any problem. But if there is a different suitable Machine Learning method is available then it is not a wise decision to go for a computation-intensive method like deep learning.

So its a call of the researcher to judge his requirement and available resources to choose the appropriate modelling method. The particular problem, its generic nature and experience in the field play a pivotal role to use the power of deep learning neural network efficiently.

Training a deep learning model is very important to get an accurate result. This article has a detailed discussion on every aspect of this training process. The theoretical background, algorithms used for training and also different steps of it. I hope you will get all your questions related to deep learning training answered here. However please feel free to comment below about the article and also any other questions you want to ask.

References

Artificial intelligence basics and background

Artificial Intelligence basics

Artificial Intelligence (AI) is a buzz word in almost all walks of our life with a meteoric growth recently. For the last few years,  it has come up as a superpower controlling the future of every scientific endeavour. Be it AI-powered self-driving car, disease detection in medical research, image/speech recognition or big data these are just the tips of the iceberg with respect to the enormous possibilities Artificial Intelligence capable of.  This article covers the Artificial intelligence basics with its genesis including modern history.

Artificial intelligence is a broad term encompassing both Machine Learning and Deep Learning. In which Machine Learning is again a bigger domain with the subdomain Deep Learning. These three domains of advance computing can be represented by the following diagram.

Artificial Intelligence basics: Machine Learning and Deep Learning as sub domains
Artificial Intelligence basics: Machine Learning and Deep Learning as sub domains

Background of Artificial Intelligence, its genesis

Before we start with the Artificial intelligence basics, we should know its background. The first instance of any machine having some intelligence akin to human was developed by Charles Babbage and English mathematician Lady Ada Lovelace of Victorian England during 1830-40.

It was called a mechanical computer and had the capacity to perform different mathematical computation. The machine algorithm she developed lead to the creation of an early computer which just existed only in paper till then. So, Ada Lovelace, the daughter of famous poet Lord Byron was named the world’s first computer programmer

Turing machine: one step towards modern computer

Another similar example is the Turing Machine developed by Allan Turing in 1950. It can be designated as the first instance of a machine having Artificial Intelligence. He wrote a famous article on Turing Machine titled “Computing Machinery and Intelligence”. 

Turing machine was the first realized model of a computer. Turing invented it while he was working at Cipher school at Bletchley Par and the mission was to break the German Enigma code during the Second World War. It was theoretically similar to modern-day electronic computers. In 1951, the US got its first commercially available electronic stored-program computer named UNIVAC.

The modern history of Artificial Intelligence

After that many years passed with lots of trials and errors, research and development without any significant advancement in the field. The main limitation was lack of training data as images are not abundant at that time and also the computing power also insufficient to analyze the voluminous data. 

However, the scenario took a sharp turn as soon as the advent of computers with higher computational power. The term Artificial Intelligence was first coined in a conference at Dartmouth College, Hanover, New Hampshire in 1956. Again a group of researchers threw themselves to unveil the superpower of AI.

Setbacks

Anyway, critics are always there and their argument against AI now becoming more prominent due to lack of its practical evidence. The government also appeared to be convinced with the argument due to a lack of success in any of the AI projects. As a result funding towards all AI research projects got stopped. It was a big blow and eventually, a winter period started in AI research during 1974 and lasted till 1980

In 1980 the AI research came to the headline for a brief period when the British government showed some interest with an intention to compete with the Japenese advancement in AI research. But that did not last long; soon due to measurable failure of some early-stage computers pushed the field into another prolonged winter period which lasted for long seven years (1987 to 1993).

Breakthrough

But the winning spree of AI was just a matter of time and inevitable. As industry leaders like IBM set foot in the AI industry and took the challenge to show the world what AI is capable of, things start to change. A team of highly qualified scientists and computer programmer threw themselves in this mission and the result was pathbreaking.

Deep Blue: the chess champion supercomputer

The first big success of the AI project was the creation of the supercomputer Deep Blue by IBM. The computer created history when it defeated the then world chess champion, Garry Kasparov on May 3rd, 1997. 

Deep Blue Vs Garry Kasparov
Deep Blue Vs Garry Kasparov (Image source: CBS news, Sunday Morning)

Back then it was so surprising that the reigning champion was not ready to admit that he has lost to a computer with Artificial Intelligence. He was crying foul play and under the suspicion that it was some grandmaster actually playing for the computer.

The computer was so accurate in making the moves but without any human emotions. Where Garry lagged behind being a human. This is where a computer always steps ahead of human being, applying only hard logic based on the vast amount of information fed to it. This victory of Deep Blue over human intelligence ushered a new age of Artificial Intelligence.

Watson: the question-answering AI-based computer

Another historic foot of establishing supremacy over human intelligence achieved by AI in 2011 when a supercomputer named Watson won the famous Quiz show called Jeopardy. In this competition, Watson defeated the defending champions Ken Jennings and Brad Rutter.

Watson and the Jeopardy challenge
Watson and the Jeopardy challenge (Image Source: IBM Research)

Watson is a question-answering computer created by IBM’s DeepQA project in the year 2010 based on Natural Language Processing. Mr David Ferruci of IBM was the key brain behind the idea of Watson. And it got the name after the founder of IBM’s founder and first CEO Thomas J. Watson.

Artificial Intelligence basics

The concept of Artificial Intelligence just reversed the traditional idea of finding a solution for any data-oriented problem. The classical programming or statistical modelling approach usually set the rules first, then apply it on the input data to achieve estimation result. Whereas Artificial Intelligence uses the example answer data along with the input data to learn the rules. See the below schematic diagram to understand it:

Artificial Intelligence basics: Difference between classical programming and AI
Artificial Intelligence basics: Difference between classical programming and AI

This concept of Artificial Intelligence suggests that it gives more emphasis on the hands-on training part. To learn from the data. Indeed this process needs a large amount of data so that the algorithm can be certain about the actual relation between the variables. Thus the idea is to establish the rules more often empirically than theoretically.

The concept of Artificial Intelligence is not a new one though. The concept first came into existence long back in 1950. During its inception, besides the concept of Deep Learning and Machine Learning, it did contain some hardcore programming rule also. For example, playing a chess game back then comprised of a lot of rules programmed to the computer. Such kind of Artificial Intelligence got a name Symbolic AI.

During 1980 the concept of Expert Systems got the limelight across the industries. An expert system on any topic actually provides an interactive information delivery system. Here a machine can play an expert role and based on the user’s input provides suitable information. In the process of developing such expert systems, the Symbolic AI transformed into Machine Learning.

Components of Artificial Intelligence

This has three main components as shown in the above figure:

Input data:

This is very obvious and also common in traditional programming or statistical modelling. We need to feed the input data in order to arrive with the estimation. The sample data in our hand either labelled or non-labelled plays this role as input data. 

Labelled data:

This is the unique part in case of Artificial Intelligence. We need to provide some example answer data to train the programme. The larger the example answer data, the more accurate is the training. This example data set is the labelled data here. As both the variables feature and label are present here. We expect the algorithm will learn from this example and identify the relationship between them.

Error optimization:

This is the third important component which calibrates the algorithm identifying how close is the estimation to the actual value. There are several metrics which provide a good measure of how good the model is performing.

Algorithm to represent the input data

In nutshell, this is the main essence of Artificial Intelligence. All machine learning or deep learning algorithms try to find out some effective way to represent the input data. This representation is of utmost importance as this is the key for successful prediction. 

For example when the problem in hand is to identify any image and the image has colour composition Red, Green and blue; then a very effective way to represent the image can be to identify the number of pixels with red colour. In similar fashion in case of speech recognition, if the algorithm can represent the language and voice modulation effectively the accuracy of recognition gets much higher.

An example of data representation

Here is an example of this representation problem with an easy graphical classification problem. This example I have read in the book “Deep learning with Python” by Keras creator and Google AI researcher Francois Chollet. It is a great book to start your journey with Artificial Intelligence.

Separating the different colour dots using data transformation
Separating the different colour dots using data transformation

See in the above figure the scattered points with two colour groups red and blue. The problem is to find out some rule to classify these two groups. A good solution to this representation problem is to create a new coordinate like the below figure. Now after the change in coordinates, the different colour dots can be easily classified with a simple rule which is the dots are blue when X>0 and red when X<0.

Data representation changing coordinates
Data representation changing coordinates

AI algorithms: not creative but effective 

This types of transformations are handled by Artificial Intelligence algorithms automatically. Like this coordinate change, other transformations like linear transformation, nonlinear operations, etc. all frequently used functions and are available for Artificial Intelligence algorithms to choose from a predefined space called Hypothetical space. In this sense Artificial Intelligence algorithms are not very creative, all they do is to select functions from this space of possibilities.

Although the algorithm is not creative, often does the work. The algorithm takes the input data; then applies suitable transformation from the Hypothetical space; the algorithm takes the help of the feedback signal obtained from the output and expected output and with this guidance, attempt to represent the input data.

The following diagram represents the flow of information process for ease of understanding.

Artificial Intelligence basics: Schematic of AI algorithm functioning
Artificial Intelligence basics: Schematic of AI algorithm functioning

Final words

So, in the simplest terms, Artificial Intelligence is all about learning through trials and examples. You provide lots and lots of example answers and the algorithm will go on perfecting itself. Unlike other prediction algorithms which reaches a plateau after a certain number of trails, AI algorithms keep improving.

A good practical example such learning process is Google’s Quick Draw. It is an AI-driven drawing game hosted by Google. As claimed by Google, it is the world’s largest doodling data set and you can also your drawing sample to it.

A screenshot of Google’s Quick Draw
A screenshot of Google’s Quick Draw

It is an experimental research on the use of AI. You will surprise to see how effortless and quick the drawing it offers using AI. You can draw a picture in less than 20 seconds time! And the reason behind its so high accuracy in pattern recognition is again as I mentioned, a huge database of example answers. Almost 15 million people have uploaded more than 50 million drawings in the database.  

Not only drawing it is the collection of several other experiments with music, video, natural language processing and many more with open access code. You can try the codes as they are open-sourced and also add your own code of AI application.

Expectations from AI should be rational and for Long term

One problem with Artificial Intelligence was the possibilities were always hyped out of proportions. The goals and expectations were set for a too short term. The obvious result of which was disappointment and loss in interest. Such disappointment resulted in two winter period in AI research as I have mentioned before.

Such winter periods slow down the development process for years together and not at all good for the researchers and scientists putting tremendous effort in AI research. They become the victims of the irrational hype created by press and media and some over enthusiasts.

When the dreams got shattered all research projects experience a crunch in research funding. The scientists who may be at the verge of some significant result got stuck with their research just because of insufficient fund. This is very heartbreaking and may deprive a scientist of his life long research achievements.

Many of the expectations from AI technology during 1960-70 are still far-reaching possibilities even in 2020. Similarly, the hype with AI in recent years may be an exaggeration too and may lead to another winter period.

Conclusion

So, we need to be very cautious in making realistic expectations out of AI. Instead of setting short term goals, we should look for a long term broad objective. Should give the researchers sufficient time to proceed with their research and development activities.

There is no denying that AI is going to be our everyday best friend. It is going to make our lives much much easier in the coming days. The day is not very far when we will take help of AI in every problem we face, we will take suggestion when we will feel sick, it will help to educate our kids, take us to our destination, help us to understand a foreign language and in doing so AI will take the whole humanity to a newer level of evolution.

This is not an unrealistic expectation and the day will eventually come. We just need to keep patience and have faith on highly talented AI scientists working hard to make this dream a real one.

References

What is deep learning? an overview

Deep learning basics

Deep learning is actually an artificial intelligence function with immense capability to find out the hidden pattern within a huge amount of data generated in this era of data explosion. It is an advanced learning system which mimics the working principle of the human brain. Such kind of vast unstructured data is not possible for the human being to analyze and draw some conclusion. So, such a learning procedure has been proved very helpful to make use of big data.

According to Andrew NG the founder of deeplearning.ai and popular Coursera deep learning specialist

Deep learning is a superpower. With it you can make your computer see, synthesize novel art, translate languages, render a medical diagnosis, or build pieces of a car that can drive itself. If that is not a superpower, I don’t know what is.

Andrew NG

In this respect, machine learning is a much bigger domain and deep learning can be considered as a subdomain of it. Deep learning relates to the deep neural network and mainly works in an unsupervised manner.  The network is popular with the name Artificial Neural Network as it mimics our brains vast network of neurons.

Deep-Neural-Network
A schematic diagram of neural network with two hidden layer

Difference between machine learning and deep learning

 The main difference between these two processes lies in the feature extraction process from images. Feature extraction is the basic component of both the processes.

Feature extraction

But the difference is machine learning requires to perform this process manually and then feed this information in the model. Whereas in deep learning the feature extraction happens automatically and provided to the network to match with the object of interest. In this context, deep learning is called an “end-to-end” process of learning.  

Resource intensive

Another major difference between these two processes is data processing capability. Deep learning can make good use of a huge number of labelled data provided you have sophisticated Graphics Processing Unit (GPU). On the other hand, machine learning has different modelling techniques to give you a good estimate even with a less number of labelled data. 

Scaling with the data

Deep learning has a big plus point over machine learning which makes deep learning far more accurate than machine learning is that deep learning algorithm scales itself with the data. That means as we use more images to train the deep learning algorithm it will give the more accurate result.

But it is not the case with machine learning algorithms. Machine learning algorithms attain a plateau after a certain level of performance achieved. It will not improve even with more training after this level. See the below image to realize the difference.

Deep learning  improves as the data size increases
Deep learning improves as the data size increases

So, now the question is which approach should be used by one? To answer that I would suggest that it depends on your situation i.e. the type of problem you desire to solve, the GPU capacity available to you and the most important factor how much labelled data you have to train the algorithm.

Deep learning is more accurate than machine learning but more complex too. So unless you have thousands of images to train it and high-performance GPU to process such a large amount of data, you should use any machine learning algorithm or combination of them.

Deep learning: the working principle

Deep learning mainly relies on Artificial Neural Network (ANN) to unravel the wealth of information from big data. So, it is very interesting to know how does this process is actually performed. 

If you have a little exposure to the traditional modelling process, you may have an idea that conventional regression models suffer from various limitations and had to fulfil several assumptions.

And most of the cases it performs not so well in capturing the real nonlinear nature of real-world data. This is mainly because of the traditional regression way of modelling does not attempt to learn from the data. This learning process which makes the real difference between these two approaches. 

The term deep in deep learning suggests the mechanism of information processing through several layers. Deep learning or deep structured learning is mainly based on a learning process which is called representation learning. It is actually finding out representations for hidden feature or pattern recognition from raw unstructured data.

Accuracy of both the approaches

The accuracy deep learning attains in its estimation is impeccable. The applications mentioned above needs to be very precise to satisfy end user’s expectation. And such accuracy can only be provided by deep learning. To achieve such accuracy, it trains itself using the labelled data to continuously improve the prediction by minimizing error. 

So, the amount of labeled data is an important factor determining how perfect is the learning process. For example to make a perfect self-driving car we need to train the algorithm with a huge amount of labeled data like images and videos of road, traffic, people walking on the road or a busy road at any time. 

No doubt deep learning is a very computation-intensive process. Processing such a huge amount of image and videos and then using them to train the algorithm may take days and weeks altogether. 

This was one of the key factors that in spite of the concept of deep learning came back in 1980 it was not in much use till recently. It is just because back then researchers were not equipped with systems with high computing capacity. In today’s era of supercomputers or systems with high-performance GPUs, advanced cloud computing facilities have made processing such enormous size of data possible within hours or even less. 

ANN: mimicking the human brain

Artificial Neural Network or ANN as the name suggests it mimics our brains working principle. In our brain, the neurons are the working unit. The network between the innumerable neurons works as layers in carrying the sensation from different body parts to the designated part of the human brain. As a result, we can feel touch, smell the fragrance, taste a tasty food or hear music.

The learning process

Human being basically learns from past experience. Since childhood, a person gathers experience about everything in his surrounding and thus learn about them. For example, how can we able to identify a dog or cat? Perhaps because we have seen a lot of these animals and learned the differences between their appearances. So, now for us chance of making a mistake in identifying these animals is almost nil.

Deep learning for feature extraction

This is the very nature of the learning process of a human being. The first time a baby sees a dog he/she knows it is a dog from the parents. Gradually through this process, the baby comes to know animals of this appearance are called dogs. This learning process of a human being is exactly followed in case of deep learning. And the result gets more and more accurate as the learning process continues.

ANN also comprises of neurons and the nodes are connected to form a web-like structure. There can be multiple layers of such neurons (generally two to three but theoretically it can be any number). These layers pass different information from one layer to another and finally produces the result.

One layer of neuron act as the input layer for the immediate next neuron layer. The term “deep” actually signifies the number of layers only in any ANN. The most common and frequently used deep learning network is known as a Convolutional Neural Network (ConvNet/CNN).

Convolutional Neural Network (CNN)

CNN is one popular form of deep learning. It actually eliminates the need for manual feature extraction from any image in order to recognize it. CNN used thousands of images and used hundreds of hidden layers for its feature extraction to match it with the object of interest.

The hidden layers are arranged in such a fashion to recognize features with a higher order of complexity. That means the first hidden layer may recognize only the edges of the image whereas the last one will recognize a more complex shape of the image which we desire to recognize.

Applications of deep learning

Today’s digital era is collecting and generating data at every moment of our presence in the digital world. Be it social networking sites, online shopping, online movies or online study/research anything. We are providing lots of data as input to get the desired output from the internet. 

The data size is enormous and also completely unstructured but has a lot of information on it. Which if analyzed properly can help the government to take better policy decision or business houses to frame an effective business plan.

Since last few years, this learning process has been the key concept behind some revolutionary ideas and applications like:

Image colourization

Recently some of the old black & white movies are relaunched with colour effects. If you watch them you may be surprised by the precision and accuracy of the colourization process. Artificial intelligence has made this task possible to complete within a few hours. It was previously only possible with human skills and hard labour.

The famous movie “Pather Panchali” of the Oscar-winning film director Shri Satyajit Ray was shot in black & white. Recently Mr Ankit Bera, Assistant Research Professor of AI at the University of Maryland in U.S. has done an experiment to colourize the movie and the result is really impressive (read the full story here). Here is a comparison of two still frames side by side from the movie for your comparison.

A screenshot from the report

Self-driving car

A self-driving car where deep learning enables a car to recognize a stop sign or to distinguish a pedestrian from a lamppost, judges the situation of a busy road and thus reducing the chance of an accident.

An autonomous car is able to take human-like decisions on the basis every probable situations you may face while driving. It is still in the testing phase. It improves its perfection as it gets trained with more data of real-life traffic condition.

Facial recognition

Its presence is now everywhere, be it biometric attendance system, AADHAR enabled transaction or your mobile’s face lock system. This recognition system is so smart that it identifies you even when you have shaved off your mustache or changed your hairstyle.

Natural language processing and Virtual assistants

In natural language processing, speech recognition where deep learning plays a crucial role in following the command given by a person to his smartphone or any smart device. You may also have tried Google’s speech to text tool to save yourself from typing a lot, so thanks to voice recognition also a gift of deep learning.

Different online service providers have launced several virtual assistants mainly on the basis of this concept. You must have heard the names of apple’s Siri, Microsoft’s Cortana, Amazon’s Alexa etc. all are very popular virtual assistants making our daily lives a lot easier.

Language translations

In language translations deep learning has played a big role as in natural language processing. This has benefitted the travelers, business people and many others who need to visit a lots of place and also communicate a lot with people speaking foreign language.

Chatbots

You may have experienced that in recent times whenever you want to contact the customer support or product help of any company to get some questions answered related to any of their product, the first basic question-answer part is generally automated and intelligently replied without any human interventions.

In medical research

It plays a pivotal role nowadays. In cancer research identifying the affected cells. A dedicated team of researchers at UCLA has built an advanced microscope which used deep learning to pinpoint the cancerous cells.

Many industries like drug industries, automobiles, agriculture sector, board game making brands, medical image analysis are actively conducting research with deep learning in their R&D sector. 

Conclusion

So, I think in this article I am able to give you a basic idea about what is deep learning and how it works. Although the idea of deep learning was conceptualized long back in 1986 due to limitation of resources it did not take up. And took more than a decade to come into action. Today we have sophisticated computing devices and no dearth of data. In fact, there are oceans of data with every aspect of our daily life.

Every moment of our life, our life and every activity happening in the world is getting stored in one or other format of data, mainly as images, videos, audios etc. These data is so huge that conventional process of data analysis can not handle it and simple human capacity would take decades to analyze it.

Here comes deep learning with its fascinating power of data analysis mainly pattern recognition. Deep learning works especially with more accuracy when the database is a large number of audio, video or image files, so it is the best fit for the situation. And this is the reason it gets its name as “Deep” as many features of the images get extracted at different layers of the deep learning process. So, deep actually refers to the depth of layers.

Deep learning is vast topic and a single article can not cover all of its aspects. So its basic features are mainly discussed here and you can start your deep learning process with this article.

Follow this blog regularly as many interesting articles regarding deep learning and its application will be posted here regularly. If you have any particular topic in your mind please let me know by commenting below. Also, share your opinion about this article and how it can be improved further.

References