How to develop a deep learning model for handwritten digit recognition?

This article describes how to develop a basic deep learning neural network model for handwritten digit recognition. I will take a very basic example to demonstrate the potential of a deep learning model. Though this example you will get some elementary idea about the following key points:

  • How deep learning performs the task of image recognition,
  • The process of building a neural network,
  • Basic concepts of neural network layers
  • Testing and validation its performance and
  • Creating some diagnostics for its evaluation.

Even if you have no prior exposure to programming or statistical/mathematical concepts, you will not face any problem understanding the article.

Deep learning model has its most prominent use in feature recognition. Especially in image recognition, voice recognition and lots of other fields. In our daily lives, we frequently see use of biometric identification of individuals, pattern recognition in smartphones, fingerprint scanner, voice-assisted search option in digital devices, chatbots answering your questions without any human intervention, weather prediction tools etc. etc.

So, it is a very pertinent question to ask “how can we develop a deep learning model which can perform a very simple image recognition task?”. So let’s explore it here.

Develop a deep learning model

All of the applications mentioned above make use of deep learning. It learns from the historic labelled data. The model uses this labelled data to learn the pattern and then predict a result for a new feature as accurately as possible. So, it is a learning process and it’s deep because of the layers in it. We will see here a deep learning model learns the pattern to match handwritten digits.

I will discuss every section with its purpose. And hope that you will find it interesting how deep learning works. I have chosen this example because of its popularity. You will find many book chapters and blogs on it too.

I have mentioned all such sources I referred to write the python code to develop deep learning model for handwritten digit recognition at the end of this article in “reference”. You can refer them to further enrich your knowledge. All of them are a very good source of information.

As you finish reading the article you will gain the basic knowledge of a neural network, how it works and the basic components of it. I will be using a popular Modified National Institute of Standards and Technology data set or in short MNIST data. 

The MNIST data

It is an image data set of handwritten digits between 0 to 9. The greyscale images are with the resolution of 28×28 pixels; 60000 images for training and 10000 images for testing. This is a very common data set used to test any machine learning model.

In this article, we will build the model writing python code. More importantly, discuss the particulars and components of a neural network model. So that in the process of building the model you got a clear understanding of each step and develop confidence for its further application.

So let’s start coding:

Calling the basic libraries to develop the deep learning model

These are the very basic python libraries required for different tasks.

# Importing required basic libraries
from numpy import mean
from matplotlib import pyplot
from sklearn.model_selection import KFold
from numpy import std
from keras import models
from keras import layers

Loading the data

The MNIST data is a default data set in the Keras library. So, you only need to load it with the following code. Four NumPy arrays are there to store the data. You can check the array types to confirm.

#Loading the data set (MNIST data is pre included in Keras)
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

The following line of codes will produce a sample set of images from the MNIST data set.

# Example of first few images of MNIST data set
for i in range(9):
	# define subplot
	pyplot.subplot(330 + 1 + i)
	# plot raw pixel data
	pyplot.imshow(train_images[i], cmap=pyplot.get_cmap('gray'))
# show the figure

See the grey-scale images below to get an idea of the handwritten digit images.

First few handwritten images from MNIST data set
First few handwritten images from MNIST data set

The training and testing data set

The following code describes the training data set. It consists 60000 sample data from the whole MNIST data set. The images are of 28×28 pixel size. And the particular sample rows comprising the training set.

#The training data set
print("Training data shape: ",train_images.shape)
print("Size of training data:",len(train_labels))
print("Train labels:",train_labels)
Description of training data
Description of training data

In the same way the following code presents the description of testing data set. Which has 10000 sample data.

#The testing data set
print("Testing data shape: ",test_images.shape)
print("Size of test data:",len(test_labels))
print("Test labels:",test_labels)
Description of test data
Description of test data

Building the model

In this deep learning neural network model notice the specification of layers. These layers are the main component in any neural net model. The neural network performs data distillation process through these layers. The layers act as sieves to refine the output of the neural network.

The first layer receives the raw inputs and passes them to the next layer for higher-order feature recognition. The next layer does the same with more refinement to identify the more complex feature. Unlike machine learning, deep learning is capable of identifying which feature to identify in each step.

In this process gradually the model output matches closely with the desired output. The feature extraction and recognition process continue with several iterations until the model’s performance is satisfactory.

network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))

Dense layers

The neural net here consists of dense layers. Which means the layers are fully connected. There can be many layers depending on the complexity of the identifiable feature. In this case, only two such layers have been used.

The first layer has an input size 512. The input shape is 28*28 which is actually the pixel size of each grey-scale image. The second layer, on the other hand, is a 10-way softmax layer. Which means that the layer has an array of 10 probability score totalling equal to 1. Each of these probability corresponds to the probability of each handwritten image belonging to any of the 10 digits from 0 to 9.

Data preprocessing

A neural network expects data values with an interval [0,1]. So, the data here needs some preprocessing before being used by the deep learning model. It can be done by dividing each value by the maximum value of the variable. But to do that we need to convert the data type from unsigned integers to floats.

The transformation used here actually converts the data type from Unit8 to float32. In this process the data value range changes from [0,255] to [0,1].

# Reshaping the image data to bring it in the interval [0,1]
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

Compilation of the model

The compilation of the model comprises some critical terms like “loss”, “metrics”, “optimizer”. They have special function while compiling the model. Here the loss function used is categorical crossentropy. Its a function to estimate the error model produces and calculate the loss score. This function is best suited for multi-class classification like in this case with 10 digit classes.

Depending on this loss score, the optimizer function optimizes the weight of the layers. Its kind of parameter adjustment of the model. This process goes on until the model achieves an acceptable level of estimation.

#Compilation step

The metrics the model uses here is “accuracy”. It suggests the goodness of fit of the model. Quite obviously, lesser the value of accuracy, better is the model.

# Encoding the labels into categories
from keras.utils import to_categorical
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

Model fitting

Now its time to fit the model with the training data. For fitting the model the number of epochs is 5 and batch size is 128. Which means the process repeats itself 5 times with 128 samples in each cycle.

# Fitting the model, train_labels, epochs=5, batch_size=128)

Now in the output, we can see two metrics loss and accuracy. These two parameters tell us how the model is performing on the training data. From the last epoch, we can see that the final accuracy and loss the model achieves are 0.9891 and 0.0363 respectively. Which is quite impressive.

Model fitting
Model fitting

Testing the model

We achieved a training accuracy for the model as 0.9891. Let’s see how the model performs with the testing data.

# Testing the model with testing data
test_loss, test_acc = network.evaluate(test_images, test_labels)
print('test_acc:', test_acc)

The test accuracy is 0.9783. Though it’s quite good although there is a drop in the accuracy from training. When the model’s performance is better in training and the accuracy is less while testing, it may be an indication of the model’s overfitting.

Accuracy of the test
Accuracy of the test

Evaluating the model

Now we know how to develop a deep learning model but how can we evaluate it? This step is as important as building the model.

We are here using the MNIST data as a practice example for the application of the network model. So, although the MNIST data is an old data set and effectively solved, we will like to evaluate our model’s performance with this data. In order to do that we will apply the K-fold cross-validation process.

In this evaluation process, the data set is divided into k-groups. Then the model fit process is repeated k times for each of the groups. Thus the name k-fold cross-validation. Except for the data in the particular group, the rest of the data is training data and the group data is used to test the model.

To perform this evaluation task we will develop three separate functions. One function for performing the k-fold cross-validation, one is to make some plots showing the learning pattern of all the validation steps and another function for displaying the accuracy scores.

Finally all these three functions will be called in the evaluation function to run each of them. The neural network model developed above has been used in this evaluation process.

Writing the function for k-fold cross validation

This is the first function for performing the validation process. We take 5 fold validation here as lesser than this number will not be enough to validate and again too big a number will take long time to complete, so it is a kind of trade of between the two.

The kfold module from scikit-learn library will automatically randomize the sample values in different groups and split them to form test and validate data.

# k-fold cross-validation
def evaluate_nn(X, Y, n_folds=5):
	scores, histories = list(), list()
	kfold = KFold(n_folds, shuffle=True, random_state=1)
	# data solitting
	for train_ix, test_ix in kfold.split(X):
		# train and test
		train_images, train_labels, test_images, test_labels = X[train_ix], Y[train_ix], X[test_ix], Y[test_ix]
		# fit model
		history =, train_labels, epochs=5, batch_size=128, validation_data=(test_images, test_labels), verbose=0)
		# model evaluation
		_, acc = network.evaluate(test_images, test_labels, verbose=0)
		print('> %.3f' % (acc * 100.0))
		# scores
	return scores, histories

Creating plots for learning behaviour

Here we create functions to plot the learning pattern through the evaluation steps. Two seperate plots will be there. One for the loss and the other for accuracy.

def summary_result(histories):
  for i in range(len(histories)):
    # creating plot for crossentropy loss
    pyplot.subplot(2, 2, 1)
    pyplot.title('categorical_crossentropy Loss')
    pyplot.plot(histories[i].history['loss'], color='red', label='train')
    pyplot.plot(histories[i].history['val_loss'], color='green', label='test')
    # creating plot for model accuracy
    pyplot.subplot(2, 2, 2)
    pyplot.title('Model accuacy')
    pyplot.plot(histories[i].history['accuracy'], color='red', label='train')
    pyplot.plot(histories[i].history['val_accuracy'], color='green', label='test')

Printing accuracy scores

It is for printing the accuracy scores. We will also calculate the summary statistics like mean and standard deviation of the accuracy scores. And finally will plot a Box-Whisker plot to show the distribution.

def acc_scores(scores):
	# printing accuracy
 and its summary
	print('Accuracy: mean=%.3f std=%.3f, n=%d' % (mean(scores)*100, std(scores)*100, len(scores)))
	# Box-Whisker plot

Finally the evaluation function

This is the final evaluation module where we will use all the above functions to get the overall evaluation results.

def evaluation_result():
	# model evaluation
	scores, histories = evaluate_nn(train_images, train_labels)
	# train and test curves
	# model accuracy scores
 # run the function

Here 5-fold cross-validation has been used. As the whole data set contains 60000 sample data so each of these k groups will have 12000 sample data.

Evaluation results

See below the accuracy for all the 5-fold evaluation steps. For the first step the accuracy is 98% and for all the rest four occasions the accuracy scores are above 99%. Which is quite satisfactory.

Accuracy score for k-fold cross validation
Accuracy score for k-fold cross validation

We have also produced some diagnostic curves to display the learning efficiencies in all five folds. The following plots show the learning behaviour by plotting the loss and accuracy of the training processes in each fold. The red line represents the loss/accuracy of the model with the training data set whereas the green line represents the test result.

You can see that except for one case the test and training lines have converged.

Plots to show learning behaviour through cross validation steps
Plots to show learning behaviour through cross validation steps

The below Box-Whisker plot represents the distribution of the accuracy mean across all folds for ease of understanding.

Distribution of the accuracy score over all the  evaluation steps
Distribution of the accuracy score over all the evaluation steps

The mean and standard deviation of all the accuracy score are as below:

Mean and standard deviation of accuracy
Mean and standard deviation of accuracy


So, now you know how to develop a very basic deep learning model for image recognition. Here you have just started with deep learning models and many things may be new to you especially if you don’t have any programming background. But not to worry with more practice things will improve.

You can reproduce the result using the code from here. But it is completely okay if you don’t understand all the steps of the coding part. Take it as a fun activity of python programming as well as deep learning. 

A very necessary step for learning neural network or in general any new thing to get yourself emerge in that field and arouse deep interest in it.  I hope that this blog will serve this purpose. This is not to overwhelm you with the complexity of the technique rather demonstrate you about its potential in the simplest and the most practical way.

So, what are you waiting for? Start writing your first deep learning program taking the help of this article. Make small changes and see how the output changes.

Don’t get frustrated if it gives an error (which will be very common while doing your first programming). Try to explore the reason for the error. Take help of the StackOverflow site to know about the error. You will get an answer there for almost all of your queries. And this is the most effective way to learn it. 

Please, don’t forget to comment below if you find the article useful. It will help me to plan more such articles.


  • Chollet, F., 2018. Deep Learning mit Python und Keras: Das Praxis-Handbuch vom Entwickler der Keras-Bibliothek. MITP-Verlags GmbH & Co. KG.

Leave a Comment