Artificial Neural Network with Python using Keras library

Artificial Neural Network

Artificial Neural Network (ANN) as its name suggests it mimics the neural network of our brain hence it is artificial. The human brain has a highly complicated network of nerve cells to carry the sensation to its designated section of the brain. The nerve cell or neurons form a network and transfer the sensation one to another. Similarly in ANN also a number of inputs pass through several layers similar to neurons and ultimately produce an estimation.

Schematic diagram of Artificial Neural Network
Schematic diagram of Artificial Neural Network
NB: Being a non-native English speaker, I always take extra care to proofread my articles with Grammarly. It is the best grammar and spellchecker available online. Read here my review of using Grammarly for more than two years. 

Perceptron: the simplest Artificial Neural Network

When any ANN consists of only one neuron it is called a perceptron. A perceptron has a single input node as well as a single output node. It is the same as the neuron in our brain consisting of dendrons and axons. 

Depending on your problem, there can be more than one neurons and even layers of neurons. In that situation, it is called multi-layer perceptron. In the above figure, we can see that there are two hidden layers. Generally we used to use ANN with 2-3 hidden layers but theoretically there is no limit.

Layers of an Artificial Neural Network

In the above figure you can see the complete network consists of some layers. Before you start with the application of ANN, understanding these layers is essential. So, here is a brief idea about the layers an ANN has

Input layer

The independent variables having real values are the components of input layer. Input variables can be more than one, discrete or continuous. They may need standardization before feeding into ANN if they have very diverse scale of data.

Hidden layer

The layers between the input and output are called hidden layers. Here the inputs gets associated with some weights and ultimately the weighted sum of all these values are calculated.

The information passed from one layer of neurons acts as inputs for the next layer of neurons. The inputs propagate through the neural network, activation function and cost function then finally yield the output.

Activation function

The weighted sum is then passed through an activation function. It has a very important role in ANN. This function controls the threshold for the output of ANN. Similar to a biological neuron which provides sensation when the impulse exceeds a particular threshold value, the ANN also only gives a particular output when the weighted sum crosses a threshold value.

The output

This is the output of ANN. The activation function yields this output from the weighted sum of the inputs.

ANN: a deep learning process

ANN is a deep learning process, the burning topic of data science. Deep learning is basically a subfield of Machine Learning. You may be familiar to the machine learning process and if not you can refer to this article for a quick working knowledge on it. Talking about deep learning, it is in recent times find its application in almost all ambitious projects. Starting from basic pattern recognition, voice recognition to face recognition, self-driving car, high-end projects in robotics and artificial intelligence deep learning is revolutionizing the modern applied science.

Read about supervised machine learning here

ANN is a very efficient and popular process of pattern recognition. But the process involves complex computations and several iterations. The advent of high-end computing devices and machine learning technologies have made our task much easier than ever. Users and researchers can now focus only on their research problem without taking the pain of implementing a complex ANN algorithm.

As time passes easier to use modules in various languages are developed encapsulating the complexity of such computation processes. The “Keras” is such a framework in Python which has made deep learning and artificial intelligence a common man’s interest and built on rather popular frameworks like TensorFlow, Theano etc. 

Here is an exhaustive article on python and how to use it

 We are going to use here this high-level API Keras to apply ANN.

Application of ANN using Keras library

Importing the libraries

The first step to start coding is to import all the libraries we are going to use. The basic libraries for any kind of data science projects are like pandas, numpy, matplotlib etc. The purpose of these libraries are discussed before in the article simple linear regression with python.

# first neural network with keras tutorial
import pandas as pd
from numpy import loadtxt
from keras.models import Sequential
from keras.layers import Dense

About the data

The example dataset I have used here for demonstration purpose has been downloaded from kaggle.com. The data collected by “National Institute of Diabetes and Digestive and Kidney Diseases”  contains vital parameters of diabetes patients belong to Pima Indian heritage.

Here is a glimpse of the first ten rows of the data set:

Diabetes data set for logistic regression
Diabetes data set for ANN

The data set has independent variables as several physiological parameters of a diabetes patient. The dependent variable is if the patient is suffering from diabetes or not. Here the dependent column contains binary variable 1 indicating the person is suffering from diabetes and 0 he is not a patient of diabetes.

dataset=pd.read_csv('diabetes.csv')
dataset.head()
# Printing data details
print(dataset.info) # for a quick view of the data
print(dataset.head) # printing first few rows of the data
dataset.tail        # to show last few rows of the data
dataset.sample(10)  # display a sample of 10 rows from the data
dataset.describe    # printing summary statistics of the data
pd.isnull(dataset)  # check for any null values in the data
Checking if the dataset has any null value

Creating variables

As we can see that the data frame contains nine variables in nine columns. The first eight columns contain the independent variables which are some physiological variables correlated with diabetes symptoms. The ninth column showes if the patient is diabetic or not. So, here the independent variables are stored in x and the dependent variable diabetes count is stored in y.

x=dataset.iloc[:,:-1].values
y=dataset.iloc[:,-1].values
print(x)
print(y)

Preprocessing the data

This is standard practice before we start with analysis on any data set. Especially if the data set has variables with different scales. In this data also we have variables which have a completely different scale of data. Some of them in fractions whereas some of them with big whole numbers.

To do away with such differences between the variables data standardization is very effective. The preprocessing module of sklearn package has a function called StandardScaler() which does the work for us.

#Normalizing the data
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
x = sc.fit_transform(x)

Create a heat map

Before we proceed for analysis, we should have a through idea about the variables in study and their inter relationship. A very handy way to have a quick knowledge about the variables is to create a heat map.

The following code will make a heat map. The seaborn” package has the required function to do this.

# Creating heat map for correlation study
import seaborn as sns
corr = dataset.corr()
sns.heatmap(corr, 
            xticklabels=corr.columns.values,
            yticklabels=corr.columns.values)
plt.show()
Heat map for correlation study among the variables
Heat map for correlation study among the variables

The heat map is very good visualization technique to easily apprehend the relation between variables. The colour sheds are the indication of correlation here. The lighter shades depict a high correlation and as the shades get darker the correlation is decreased.

The diagonal elements of a heat map is always one as they are correlation between the same variable. As we expected we can find some variables here which have higher correlation which was not possible to identify from the raw data. For example pregnancies and age, insulin and glucose, skinthikness have a higher correlation.

Splitting the dataset in training and test data

For testing purpose, we need to separate a part of the complete dataset which will not be used for model building. The thumb rule is to use the 80% of data for modelling and keep aside the rest of the data. It will work as an independent dataset. Now we need to test the fitted model’s performance using this independent dataset.

# Splitting the data for training and testing
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test=train_test_split(x,y, test_size=0.20, random_state=0)

Here this data splitting task has been performed with the help of model_selection module of sklearn library. This module has an inbuilt function called train_test_split which automatically divides the dataset into two parts. The argument test_size controls the proportion of the test data. Here the test size is 0.2 so the test dataset will contain 20% of the complete data.

Modelling the data

So we have completed all the prerequisite steps before modelling the data. Here the response variable is a binary variable having 0 and 1 as output. A multilayer perceptron ANN is the best suited to model such data. In this type of ANN, each layer remains connected to each other and works as input layer for the immediate next neuron layer.

For using a multilayer perceptron, Keras sequential model is the easiest way to start. To use sequential model we have used model=sequential(). The activation function here is the most common relu function frequently used to implement neural network using Keras.

# define the keras model
model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

Compiling the model

As the model is defined we will now compile the model with adam optimizer and the loss function called binary_crossentropy. While the training process will continue in several iterations, we can check the model’s accuracy with the [‘accuracy‘] argument passed in metrics function.

# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

While compiling the model these two arguments loss and optimizer plays an important role. The loss function generally depends on the particular problem you are addressing through ANN. For example, if you have a regression problem then the loss function you will be using is Mean Squared Error (MSE).

In this case as we are dealing with a binary response variable so the loss function here is binary_crossentropy. If the response variable consists of more than two classes then the loss function should be categorical_crossentropy.

In a similar way the optimization algorithm used here is adam. There are several others also like RMSprop, Stochastic Gradient Descent (SGD) etc. and their selection has an impact on the tuning model’s learning and momentum.

Fitting the model

Fitting the model has again two crucial parameters. Initializing them with optimum values to a great extent determines model’s efficiency and performance. Here the epochs decides how many iterations will be there through the training set.

And the batch_size is as the name suggests is actually the batch of input samples passed at a time through the ANN. It increases the efficiency of the model as the model does not have to process the whole input at a time.

# fit the keras model on the training set
train=model.fit(x_train, y_train, epochs=100, batch_size=10)

Here I have mentioned batch_size with 10 will enter at a time and total epochs will be 100. See the below output screenshot, here first 10 epochs is captured with the model’s accuracy at every epoch.

Evaluating the model

As the model trained and compiled we can check the model’s accuracy. For the model’s accuracy, Keras has model. evaluate function which gives accuracy value as 68.24. But you have to keep in mind that this accuracy can vary and may get changed each time the ANN runs.

# evaluate the keras model
_,accuracy = model.evaluate(x_train, y_train)
print('Accuracy: %.2f' % (accuracy*100))

Prediction using the model

Now the model is ready for making prediction. The values of x_test are privided as ANN inputs.

# make probability predictions with the model
# make probability predictions with the model
predictions = model.predict(x_test)
# round predictions 
rounded = [round(x[0]) for x in predictions]
print(rounded[:10])
print(y_test[:10])

I have printed here both the predicted y_test results as well as the original y_test values (first 10 values only) and it is clear that the prediction is correct for all of them.

Comparing the predicted values and the original values of test set (first 10 values only)
Comparing the predicted values and the original values of test set (first 10 values only)

Visualizing the models performance

# Visualizing training process with validation and accuracies
import matplotlib.pyplot as plt
plt.plot(train.history['accuracy'])
plt.plot(train.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()
plt.plot(train.history['loss']) 
plt.plot(train.history['val_loss']) 
plt.title('Model loss') 
plt.ylabel('Loss') 
plt.xlabel('Epoch') 
plt.legend(['Train', 'Test'], loc='upper left') 
plt.show()

Conclusion

So we have just completed our first deep learning model to solve a real world problem. This was a very simple problem with a smaller data size just for demonstration purpose. But the basic principal for fitting an ANN will be same everywhere irrespective of data complexity and size. Important is you should know how it works.

Future scope

We have obtained here an accuracy of ANN of 68.24 which has a lot of scopes to get improved. So we need to put further effort to improve the model. You can start with this by tweaking the number of layers the network has, the optimization and loss function used in the model definition and also the epochs and batch_size. Changing these parameters of the model may result in further higher accuracy.

For example in this particular example, if we increase the epochs number from 100 to 200 the accuracy increases to 77% !!!. It is quite a jump in the model efficiency. Likewise simple change in other parameters can also be very helpful.

If there is scope using more sample data in training the model also an effective way of increasing the model’s prediction efficiency. So, once you have a defined model in you hand there is ample scope you can always think of improving it.

Hope this article will help you to take big step forward towards the vast, dynamic and very interesting world of deep learning and AI.

References: