This article is all about the basic data structure of deep learning called Tensors. All inputs, outputs and transformations in deep learning are represented through tensors only. Depending on the complexity of the data tensors with different dimensions play the role of the data container.
So, it goes without saying that to improve deep learning skill if you must be confident in your knowledge of tensors. You should be fluent with its different properties and mathematical treatment. This article will help you to get introduced to tensors. As you finish this article, you will be thorough with the following topics:
- What are tensors
- Properties of tensors like dimension, rank, shape etc.
- Use of tensors in deep learning
- Real-life examples of tensor application
The importance of tensors can be understood by the fact that Google has created a complete machine learning library namely Tensorflow on tensors. So, in this article, I will try to clear the basic idea about tensor, different types of tensors, their application with executable python code.
I will also try to keep it as simple as possible. The mathematical parts will also be presented with the help of python scripts. As it will be much easier to understand for those with no or little mathematical background. Some basic knowledge of matrics will certainly be beneficial for quick learning.
So let’s start the article with the most obvious question;
What is a Tensor?
Tensor is nothing but a container of data. It works the same as matrics do for NumPy. In tensor terms, a matrix is a two dimensional (2-D) tensor. In a similar way, a vector is a one-dimensional tensor whereas a scalar is a zero-dimensional tensor.
When we deal with an image, then it has three dimensions like height, weight and depth. So a 3-D tensor is required to store an image. Likewise, when there is a collection of images, another dimension of no. of images gets added. So, now we will need a container with four dimensions. A 4-D tensor will serve the purpose. To store videos 5-D tensors are used.
Generally in neural networks, we need to use tensors up to four dimensions. But it can go up to any dimensions depending on the complexity of the data. The NumPy matrices can be thought of a general form of tensors with any arbitrary dimensions.
These are tensors with zero dimension. Data types like float 32, float 64 are all scalar data. These scalar data has rank zero as they have zero axes. Python’s ndim attribute can display the number of axes of any data structure. See the following code applied to a scalar data structure.
You can try these simple codes and check the results. If you are just getting familiar to python compiler it can be a good start.
These are one-dimensional (1-D) tensors. So the rank is one. It’s often confusing differentiating between an n-Dimensional vector with n-Dimensional tensor. So for example if we consider the following vector
It is a six dimension vector with one axis, not a 6-D tensor. A 6-D tensor will have 6 axes with any number of dimensions along each of the axes.
These are 2-D tensors with two axes. A matrix has rows and columns hence two axes and rank is two. Again we can check this with the ndim attribute. Let’s take a NumPy matrix of size (3,4) which means the matrix has 3 rows and 4 columns.
So, lets check its rank in the same way as we did in case of scalar and vector:
While you are writing the codes, be extra cautious with matrix input. Often the braces open and closing cause errors.
Tensors with higher dimensions
As I have mentioned at the beginning, tensors commonly we use have dimensions up to four. But in case of video data the dimensions can go up to five. We can easily understand data structures of dimensions up to two. But when it goes beyond that, it becomes a little difficult to visualize them.
In this section we will discuss some high dimensional tensors and the way they store data. So, let’s start with 3-D tensors. Let’s consider the following tensor and try to identify its three dimensions.
You can see it is actually a data structure containing three matrices each with 3 rows and 4 columns. See this image to understand the shape of this tensor. Let’s create a variable to store the data and check its rank with ndim attribute.
# High dimensional tensors x=np.array([[[2,5,6,9], [3,9,0,1], [2,8,9,1]], [[12,5,16,9], [4,6,0,1], [2,5,9,8]], [[1,0,6,2], [8,10,0,5], [13,3,6,1]]]) print(x) print(x.ndim)
See the output below to understand the structure of a 3-D tensor. It is a collection of matrices. Thus unlike a single matrix with two axes a 3-D tensor has three axes.
The same way we get a 3-D tensor, if some of such 3-D tensors are to be grouped then another dimension gets created making the tensor a 4-D tensor. See the image for a hypothetical 4-D tensor. Here you can see three cubes are clubbed. Such 4-D tensors are very useful for storing images for image recognition in deep learning.
In the same fashion we can have more higher dimension tensors. Though tensors up to 4 dimension are more common, some times to sore videos 5-D tensors are also used. Theoretically there is no such limitation in dimension. For the sake of data storage in an organized manner any n number of dimensions can be used.
This is the type of tensors when we need to store data with yet another dimension. Video data can be an ideal example where 5-D tensors are used.
If we take an example of a 5-minute video of 1080 HD resolution, then what will be the dimension of its data structure? Let’s calculate in a simple way. The pixel size will be 1080 x 1920 pixels. The time duration of the video in seconds is 5 x 60=300 seconds.
Now if the video is sampled by 10 frames/second then a total number of frames will be 300 x 10=3000. Suppose the colour depth of the video is 3. So for this video, the tensor should have 4 dimensions, the shape is (3000, 1080,1920,3).
So, the single video clip is a 4-D tensor. Now if we want to store multiple videos, say 10 video clips with 1080 HD resolution, then we need 5-D tensors. The shape of this 5-D tensor will be (3000, 1080,1920,3,10).
This is an enormous size of the video content. If we want to use such a huge data directly in deep learning, the training process will be never ending. So, such kind of data needs size reduction and several preprocessing steps before using as input in neural network.
Shape of tensors
This is a concept to mention a number for the length of each axis of a tensor. These are a tuple of integers indicating the length of each dimension.
A vector has only one axis/dimension so the shape is just a single element. The vector we used here as an example has 6 elements so its shape is (6,).
The matrix as 2-D tensor we discussed above has the shape (3,4). As it consists of 3 rows and 4 columns.
Likewise in case of a 3-D tensor, the shape tuple will contain the length of all its three axes. For the example we took here has shape (3,3,4). See the image below to visualize it.
Again the 4-D tensor we took as an example above has a shape (3,3,7,4) as it groups three separate cubes together. The image below presents a higher dimension figure to understand its dimensions and shape.
Real life examples of tensors as data container
So, as of now, I think the basics of tensor is clear to you. You know as a data structure how a tensor stores the data. We took small examples of commonly used data structures as tensors.
But in general, the tensors used for real-life problem solving are much more complex. The deep learning used for image recognition often deals with thousands of images stored in the database. So, which data structure should we use to handle such complex data? Here comes tensors to rescue us.
The MNIST data set with handwritten digits
Let’s take a real-life example of such image database. We will take the same MNIST data we used for handwritten digit recognition in an earlier blog post. It is an image database storing 60000 images of handwritten images. And it is effectively stored in a 3-D tensor with shape (sample_size, height, weight).
Let’s load the database. It is a default database in the Keras library.
#Loading the MNIST data set from keras.datasets import mnist (train_images, train_labels), (test_images, test_labels) = mnist.load_data()
The python codes we applied before in this article can be applied here too to check the number of axes of the training data set.
# Checking the axes of train_images print(train_images.ndim)
And the above line of code will return the rank as 3. Again we will check the shape of the data structure storing the whole database.
# Checking the shape of the tensor print(train_images.shape)
The shape it will return is (60000,28,28). Which suggests that it has 60000 images each with size 28×28 pixels.
Lets check an image from the data set. As I have mentioned the data set contains hand written image of digits and it is a classic example data set for feature recognition. Although it is essentially a solved data set, still many deep learning enthusiasts use it to test their new model’s efficiency.
So here is the code for printing the 10th digit from this data set. We will use the pyplot module from matplotlib library.
# Printing the 10th image from the MNIST data set import matplotlib.pyplot as plt sample_image=train_images plt.imshow(sample_image)
The output of the above code will be 10th image of a handwritten digit. See below if you recognize it 🙂
Stock price data
In the Indian stock market the price of each stock changes every minute. A particular stock’s high, low and final stock price for each minute of a trading day is very important data for the traders.
See, for example, the candlestick chart of a particular stock on a particular trading day. The chart shows the stock price behaviour from 10:00 AM to 3:00 PM. That means a total of 5 hours of the trading day i.e. 5 x 60=300 minutes.
So the dimension of this data structure will be (300,3). That makes it a 2-D tensor. This tensor stores a stock’s high, low and final price for a particular day.
Now if we want to store the stock’s price for the whole week? Then the dimension will become (trading_week, minutes, stock_price); if on that week there are 5 trading days, then (5,300,3). That makes it 3-D tensor.
Again if we want to store a number of stocks price for a particular week? say for 10 different stocks? So another dimension gets added. It becomes a 4-D tensor with shape (trading_week, minutes, stock_price, stocks_number) i.e. (5,300,3,10).
Now think of mutual funds, which are the collection of stocks. So if we consider someone’s mutual fund portfolio having different mutual funds, then to store the high, low and final price of all the stocks of that portfolio for a whole trade week we will need a 5-D tensor. The shape of such tensor will be (trading_week, minutes, stock_price, stocks_number, mutual_funds).
So, tensor and its different properties are now clear to you. I know there are some new terms and for the first time, these may appear a little confusing. So here in the below table, I have once again summarised them for a quick revision.
|Type of tensor||Uses||Rank/axes||Shape|
|0-D Tensor||Storing single value||0||Single element|
|1-D Tensor||Storing vector data||1||Length of an array|
|2-D Tensor||Storing data in matrices||2||Rows, columns/samples, features|
|3-D Tensor||Time series, single image||3||Width, height, colour depth (in case of an image)/ samples, time lags, features (in case of time series)|
|4-D Tensor||Storing Images||4||Width, height, colour depth, no. of images/ samples, channels, height, width|
|5-D Tensor||Storing videos||5||Sample, frame, height, width, channels|
Please also refer to the articles mentioned in the reference for further reading. These are also very informative articles and you can brush up your knowledge.
Hope that you have found the article helpful. If you have any questions or doubts regarding the topic, please put it in comments below. I would like to answer them.
Follow this blog for forthcoming articles where I am going to discuss more advanced topics on tensors and deep learning in general.
Also if you liked the post, subscribe it so that you can get the notifications whenever new blogs are added.