How to set up your deep learning workstation: the most comprehensive guide

Set up deel learning workstation

This article contains a step by step detailed guideline to set up a deep learning workstation with Ubuntu 20.04. This is actually a documentation of the process I followed for the same in my computer. I repeated this process a no. of times. And every time I thought I should have documented the process. Proper documentation helps a quick and error-free set up in the next instance.

I have mentioned the most common mistakes and errors during the process and how to avoid or troubleshoot them. Bookmarking this page can help you quickly refer it whenever you get stuck in any of the steps.

I have done this complete setup process a few times in both of my old and new laptops with completely different configurations. So, hope that the problems I faced are the most common one. It took me a considerable time to fix all those issues, mainly by visiting different discussion groups like StackOverflow, Ubuntu discussion forum and many other discussion threads and blogs.

I compiled them in one place here. So that you don’t have to visit multiple sites and refer to this post only to complete the whole installation process. In this way, this documentation will save a lot of your valuable time.

Prerequisites to set up deep learning workstation

I assume that you already have Ubuntu on your computer. If not then please install the latest version of Ubuntu. This is the most famous open-source Linux distribution and available for free download here. Although it is possible to run deep learning Keras models on Windows, it is not recommended.

Why should you use Ubuntu for deep learning? Refer to this article

Another prerequisite for running deep learning models is a good quality GPU. I will advise you to have an NVIDIA GPU in your computer for satisfactory performance. It is a necessary condition not must though. Because running sequence processing using recurrent neural network and image processing through convolutional neural models in CPU is a difficult proposition.

Such models may take hours to give results when run with CPU. Whereas a modern NVIDIA GPU will take merely 5-10 minutes to complete the models. In case if you are not interested to invest for GPU an alternative is using cloud service for computing paying hourly rent.

However, in long run, this using this service may cost you more than upgrading your local system. So, my suggestion will be if you are serious about deep learning and wish to continue with even moderate use, go for a good workstation set up.

The main steps to set up a deep learning workstation

Now I assume that you have already completed with all the prerequisites to set up your deep learning experiments. It is a little time-consuming process. You will require a stable internet connection to download various files. Depending on the internet speed the complete process may take 2-3 hours (with an internet speed of 1gbps in my case it took 2 hours) to complete. The main steps to set up a deep learning workstation are as follow:

  • Updating the Linux system packages
  • Installation of Python pip command. It is the very basic command going to be used to install other components
  • Installing the Basic Linear Algebra Subprogram (BLAS) library required for mathematical operation.
  • HDF5 data frame installation to store hierarchical data
  • Installation of Graphviz to visualize Keras model
  • CUDA and cuDNN NVIDIA graphics drivers installation
  • Installation of TensorFlow as the backend of Keras
  • Keras installation
  • Installation of Theano (optional)

So, we will now proceed with the step by step installation process

Updating the Linux system packages

The following line of commands will complete the process of Linux system up-gradation process. You have to type the commands in Ubuntu terminal. The keyboard shortcut to open the terminal is “Ctrl+Alt+T”. Open the terminal and execute the following lines of code.

$ sudo apt-get update
$ sudo apt-get --assume-yes upgrade

Installing the Python-pip command

The pip command is for installing and managing Python packages. Next which ever packages we are going to install, this pip command will be used. It is an replacement of the earlier command easy_install. Run the following command to install python-pip.

$ sudo apt-get install python-pip python-dev

It should install pip in your computer. But sometimes there may be exceptions. As it happened to me also. See the below screenshot of my Ubuntu terminal. It says “Unable to locate package python-pip”.

It created a big problem as I was clueless about why it is happening. In my old computer, I have used it no. of times without any issue. After scouring the internet for several hours I got the solution. This has to do with the Python version installed in your computer.

If you are also facing the problem (most likely if using a new computer) then first check the python version with this command.

$ ls /bin/python*

If it returns python version 2 (for example python 2.7) then use python2-pip command or if it returns higher version python like python 3.8 then use python3-pip command to install pip. So, now the command will be as below

$ sudo apt-get install python3-pip

Ubuntu by default uses Python 2 while updating its packages. In case you want to use Python 3 then it needs to be explicitly mentioned. Only Python means Python 2 for Ubuntu. So, to change the Python version, use the following code.

# Installing Python3
$ sudo apt-get install python3-pip python3-dev

Installation steps for Python scientific suit in Ubuntu

Here the process discussed are for Windows and Linux Operating systems. For the Mac users they need to install the Python scientific suit via Anaconda. They can install it from the Anaconda repository. It is continuously updated document. The documentation provided in Anaconda is very vivid one with every step in detail.

Installation of the BLAS library

The Basic Liner Algebra Subprogram (BLAS) installation is the first step in setting up your deep learning workstation. But one thing Mac users should keep in mind that this installation does not include Graphviz and HDF5 and they have to install them separately.

Here we will install OpenBLAS using the following command.

$ sudo apt-get install build-essential cmake git unzip \
pkg-config libopenblas-dev liblapack-dev

Installation of Python basic libraries

In the next step, we will need to install the basic Python libraries like NumPy, Panda, PMatplotlib, SciPy etc. These are core Python libraries required for any kind of mathematical operations. So, be it machine learning or deep learning or any kind of computation intensive task, we will need these libraries.

So use the following command in Ubuntu terminal to install all these scientific suite simultaneously.

# installation of Python basic libraries
$ sudo apt-get install python-panda python-numpy python-scipy python- matplotlib python-yaml

Installation of HDF5

The Hierarchical Data Format (HDF) version 5 is an open-source file format which supports large, complex and heterogeneous data sources. It was developed by NASA to store large numeric data files in efficient binary formats. It has been created on the other two hierarchical data formats like HDF4 and NetCDF.

HDF5 data format allows the developer to organize his machine learning/deep learning data in a file directory structure very similar to what we use in any computer. This directory structure can be used to maintain the hierarchy of the data.

If we consider the directory nomenclature in the computer filing system, then the “directory” or “folder” is the “group” and the “files” are the “dataset” in case of HDF5 data format. It has importance in deep learning in order to save and fetch the Keras model from the disc.

Run the following command to install HDF5 in your machine

# Install HDF5 data format to save the Keras models
$ sudo apt-get install libhdf5-serial-dev python-h5py

Installation of modules to visualize Keras model

In the next step we will install two packages called Graphviz and pydot-ng. These two packages are necessary to visualize the Keras model. The codes for installing these two packages are as follow:

# Install graphviz
$ sudo apt-get install graphviz
# Install pydot-ng
$ sudo pip install pydot-ng

These two packages will definitely help you in the execution of the deep learning models you created. But for the time being, you can skip their installation and proceed with the GPU configuration part. Keras can also function without these two packages.

Installation of opencv package

Use the following code to install opencv package

# Install opencv
$ sudo apt-get install python-opencv

Setting up GPU for deep learning

Here comes the most important part. As you know that GPU plays an important role in deep learning modelling. In this section, we are going to set up the GPU support by installing two components namely CUDA and cuDNN. But to function properly they need NVIDIA GPU.

Although you can run your Keras model even in the CPU, it will take much longer time to train a model to compare to the time taken by GPU. So, my advice will be if you are serious about deep learning modelling, then plan to procure an NVIDIA GPU (using cloud service paying hourly rent is also an alternative).

Lets concentrate on the setting up of GPU assuming that your computer already have latest one.

CUDA installation

To install CUDA visit NIVIDIA download page following this link https://developer.nvidia.com/cuda-downloads. You will land in the following page. It will ask for selecting the OS you are using. As we are using Ubuntu here (to know why to use Ubuntu as the preferred OS read this article) so click Ubuntu.

CUDA installation-OS selection
CUDA installation-OS selection

Then it will ask other specifications of your workstation environment. Select them as per your existing specifications. Like here I have selected OS as Linux. I am using a Dell Latitude 3400 laptop which is a 64 bit computer, so in next option I selected x86_64; the Linux distribution is Ubuntu version 20.04.

Finally the installer type you have to select. Here I have selected the network installer mainly because it has comparatively smaller download size. I am using my mobile internet for the time being. So, it was the best option for me. But you can choose any of the other local installation options if there is no constrain of internet bandwidth. The plus point of local installation is you have to do this only once.

CUDA installation-specification selection
CUDA installation-specification selection

As all the specifications are mentioned, NVIDIA will provide you the installer. Copy the code from there and run in Ubuntu terminal. It will use Ubuntu’s apt to install the packages, which is the most easiest way to install CUDA.

CUDA installation code
CUDA installation code
$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
$ sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
$ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
$ sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
$ sudo apt-get update
$ sudo apt-get -y install cuda

Install cuDNN

“cuDNN is a powerful library for Machine Learning. It has been developed to help developers like yourself to accelerate the next generation of world changing applications.”

NVDIA.com

To download the specific cuDNN file for your operating system and linux distribution you have to visit the NIVIDIA download page.

Downloading cuDNN
Downloading cuDNN

To download the library, you have to create an account with NVIDIA. It is a compulsory step.

NVIDIA membership for Downloading cuDNN
NVIDIA membership for Downloading cuDNN

Fill in the necessary fields.

NVIDIA membership for Downloading cuDNN

As you finish registration a window with some optional settings will appear. You can skip them and proceed for the next step.

NVIDIA membership for Downloading cuDNN

A short survey by NIVIDIA is the next step. Although it is on the experience as developer, you can fill it with any of the options just to navigate to the download page.

Download survey for cuDNN
Download survey for cuDNN

Now the page with several download options will appear and you have to choose according to your specifications. I have selected the following debian file for my workstation.

Selecting the OS for cuDNN download
Selecting the OS for cuDNN download

Download the file (the file size is around 300mb in my case). Now to install the library, first change the directory to enter in the download folder and execute the install command.

Once you are in the directory where the library has been downloaded (by default it is the download folder of your computer) run the command below. Use the filename in place of **** in the command.

$ sudo dpkg -i dpkg -i ******.deb

You can follow the installation process from this page. With this the cuDNN installation is completed.

Installation of TensorFlow

The next step is installation of TensorFlow. It is very simple. Just execute the below command to install TensorFlow without GPU support using the pip command.


# Installing TensorFlow using pip3 command for Python3
$ sudo pip3 install tensorflow

Installing Keras

This is the final step of setting up your deep learning workstation and you are good to go. You can run the simple below command.

$ sudo pip3 install keras

Or you can install it from Github too. The benefits of installing Keras from Github are that you will get lots of example codes from there. You can run those example scripts to test them on your machine. These are very good source of learning.

$ git clone https://github.com/fchollet/keras
$ cd keras
$ sudo python setup.py install

Optional installation of Theano

Installation of Theano is optional as we have already installed TensorFlow. However, installing Theano can prove advantageous while building Keras code and switching between TensorFlow and Theano. Execute the code below to finish installing Theano:

$ sudo pip3 install theano

Congratulations !!! you have finished with all installations and completed the set up for your deep learning workstation. You are now ready to execute your first code of deep learning neural network.

I hope this article will prove helpful to set up your deep learning workstation. It is indeed a lengthy article but covers all technicalities which you may need in case of any difficulty during the process. A little knowledge about every component you are installing also helps you to make any further changes in the setting.

Let me know how you find this article by commenting below. Please mention if any information I missed or any doubt you have regarding the process. I will try my best to provide the information.

Why Ubuntu is the best for Deep Learning Framework?

Ubuntu for deep learning

Why use Ubuntu for deep learning? This is the question this article tries to answer. After reading this article you will not have any doubt regarding which platform you should use for your deep learning experiments.

I was also quite happy with my windows 10 and Colab/Jupyter notebook combination for all of my Artificial Intelligence (AI)/Machine Learning(ML)/Deep Learning(DL) programming. Until I decided to start some serious work with deep learning neural network models.

Is it really important?

“[M]achines of this character can behave in a very complicated manner when the number of units is large”

Alan Turing (1948), “Intelligent Machines”, page 6

Soon I started my first model building, the limitation of my present working environment came into my notice. In some forums like Quora, Reddit etc. I was reading some threads on deep learning. And suddenly someone there in his reply mentioned that Ubuntu is a better choice for serious application in deep learning.

Suddenly it struck me that probably it is not a wise choice to continue with Windows for advanced application of deep learning and AI. But it was just a hunch that time. And I needed strong logical points before I made my mind to switch an OS which the only platform I have ever used.

So, I started scouring through internet. Read almost all blogs, threads in discussion forum to make sure if switching the platform really worth my time. Because getting aquiented with a completely new OS takes time and time is money for me.

If anyone of you also in the learning phase and serious about deep learning, this article will help him/her to make an informed decision to select which platform he/she should use. Because it is always a waste of valuable time to switch your working environment at a later stage of learning and lots of rework too.

I have already done the heavy work for you and presenting a vivid description of the topic so that you get all your questions answered at one place.

So let’s start with an introduction with Ubuntu. Although this is not an article on Ubuntu. You can find so many good articles on Ubuntu. But still before knowing its special features concerning deep learning, here is a very brief idea.

What is Ubuntu?

Ubuntu is one of the most popular forms of Linux distribution. It is developed by Mark Shuttleworth of Canonical lab. It is also the most famous open source technology that means all features and applications it offers are completely free. And it is an undeniable fact that being free makes an application mile ahead in popularity automatically.

“What commercialism has brought into Linux has been the incentive to make a good distribution that is easy to use and that has all the packaging issues worked out.”

Linus Torvalds, Principal developer of the Linux kernel

Being open-source, Ubuntu offers its a new update almost twice in a year while Long Term Support(LTS) releases after every two years with updated security patches. It has three main categories for distribution which are core, desktop and Server.

The core version is mainly for those working on IoT devices and robotics. The desktop version is for common users doing day to day office tasks and also programming applications. The server version is obviously for client-server architecture and generally meant for industry uses.

Why Ubuntu is preferred for deep learning?

The Ubuntu version I installed recently is version 20.04 and it is the latest version on this distro. It is a much-improved version than its predecessor. Especially the additional supports it provides for AI, ML and DL programmer is just stupendous.

The MicroK8s feature

“Given its smaller footprint, MicroK8s is ideal for IoT devices- you can even use it on a Raspberry Pi device!”

Kubernets.io, Technical blogs
MicroK8s integration in Ubuntu
MicroK8s integration in Ubuntu

The user interface has improved a lot. The installation process has become very easy (it was always smooth though). The Ubuntu 20.04 version now comes with support for ZFS (a file system with high availability and data integrity) and an integrated module called Microk8s. So, the AI, DL developers now don’t have to install it separately.

Microk8s enables the AI application module to get set up and deployed blazing fast. It comes preloaded with all necessary dependencies like automatic update and security patches. Quite obvious that with this version of Ubuntu you will now need to spend much lesser time to configure the environment.

Kubeflow

It is another deep learning edge of Ubuntu 20.04 and comes as an add on to Microk8s. Kubeflow was developed by Google in collaboration with Canonical, especially for Machine Learning applications. It provides inbuilt GPU acceleration for deep learning R&D.

What is Kubeflow?

Kubeflow deployed with Kubernetes and do away with the barrier to create production-ready stacks. It provides developers with enhanced AI, ML capabilities with edge computing feature. The researchers and developers involved in cutting edge research activities get a secured production environment with strict confinement in complete isolation.

Kubeflow architecture
Source: Kubeflow blog by Thea Lamkin

The security provided by Kubeflow and Kubernetes integration is unparallel. Many AI/ML/DL development add ons like Jaeger, Istio, CoreDNS, Prometheus, Knative etc come integrated with it and can be deployed with a single command.

The programming edge of Ubuntu

When it comes to programming activities, Ubuntu is undoubtedly the leader. Not only for AI, ML or DL programming but any kind of programming task and application development task is best performed when the operating system is Ubuntu.

It has the best libraries, vast examples and tutorials readily available for users. The support for all open source software used with Ubuntu is massive to solve any issue you face quickly. The updates are also regular and irrespective of which version you are using.

The enhanced Graphics Processing Unit

Powerful GPU is an important component for serious ML/DL programming. Ubuntu has an edge here to make any contemporary changes in AI environment. NVIDIA the respectable name in GPU manufacturing industry has put all efforts to make Ubuntu powerful with CUDA to its maximum capacity.

Ubuntu in its latest version 20.04 also gives its user an option to use external graphics cards through thunderbolt adapters. They can add them through dedicated PCI slots too.

So no surprise that all deep learning frameworks like Keras, TensorFlow, OpenCV, PyTorch etc all prefer Ubuntu over all other OS. The world leaders in advanced AI/ML/DL research and development like autonomous car sector, the CERN and LHC, famous brands like Samsung, NVIDIA, Uber etc. all use Ubuntu for their research activities.

Advanced feature and support for Hardware

The support Ubuntu comes with for hardware is also exceptional. Ubuntu provides organization-specific hardware certification which means high compatibility is assured. The hardware has tight integration with BIOS and factory level quality assurance.

To achieve the quality hardware Canonical directly deals with the hardware manufacturers. Canonical develops partnerships with major hardware manufacturers in order to provide an operating system with preloaded and pretested features.

The support team is as usual exceptional anytime ready for any kind of troubleshooting. With all these assurances developers can fully concentrate on their R&D.

Finally the Software

Canonical’s Ubuntu has its own open-source software collection. The software devices all are compatible both at board level as well as component level. All of its versions old or latest contain the same package of software. This feature has several advantages.

The large user base of Linux using different versions of Ubuntu with a seamless experience of switching between them. This becomes possible only because of the same software packages across all the versions. Developers can easily test their applications locally before launching them on the web for global users.

The bunch of open source software makes it possible fast creation of AI models. Creation of software and debugging is fast and easy on IoT hardware before deployment.

The snapcraft tool

Snapcraft, the app store for Linux

It is another major feature of Ubuntu which makes it a clear winner for an ideal programming OS. Snap is a feature for packaging and distribution of containerized applications. The automatic updates in Ubuntu are very safe to install and execute only because of this snap feature.

Snapcraft is a command line tool which creates snaps. This tool makes packaging of applications very easy. The feedback of the users through snapcraft tool has immense importance for the developers. These feedback provides the necessary insights about the software and helps further improvement.

For example, a study made by Canonical revealed that maximum users of Ubuntu never update the software. So, based on this feedback they started to provide automatic updates. Canonical does not need to provide support for older versions. As the complete user base simultaneously moves to the latest version of Ubuntu.

Massive online support base

Being an open-source platform, Ubuntu has a massive online support and documentation repository. Any user anytime can use the service like Slack and Skype to ask their queries. The Ubuntu support group is also very vibrant. Here you can expect a reply from the development team itself.

Even popular question-answer groups like Quora, Reddit etc. also have threads on Ubuntu related queries. I personally got many of my queries already answered there. Even you have some unique problem that has not answered earlier, you can post them in any of these platforms. It is highly likely that within a few hours you will get some really helpful suggestions by either any normal user or the Ubuntu support/development team itself.

Final words

As you finish reading this article you have a clear idea of why you should pick Ubuntu as your machine learning or deep learning programming platform. I have tried my best to put together all the information I got reading many articles online or offline.

I invested a lot of my time researching this topic to be 100% sure before diving deep into the advance learning. It is an important decision no doubt. I had bitter experiences before when I already put a lot of effort into learning a particular application. And then one day due to some limitation I had to backtrack and change that platform or application.

It was quite a rework and wastage of time starting fresh from scratch. And it can be avoided if I had done thorough research in the very beginning. So, I learned my lessons and made no mistakes this time. And hope it will also help you make an informed decision.

So, please let me know if you find the article useful by commenting below. Any queries, doubt, suggestions are welcome. I would try to improve the post further based on your comments.