The most common type of Machine Learning is Supervised Machine Learning. The nomenclature is due to the fact that the learning process being supervised by the result which is already known. The learning process goes through several iterations. The process continues until the difference between the actual and estimated result comes under an acceptable level.
“Computers are able to see, hear and learn. Welcome to the future.”~Dave Waters. Department of Earth Sciences, University of Oxford Associate Professor of Metamorphic Petrology (retired)
The data used in supervised machine learning are called “labelled data” because these data are already tagged with the right answer. Once the training part is complete and a robust model is achieved, some new inputs are provided. The task of the model now is to predict the label of this unforeseen inputs based on the labelled data used before.
In mathematical notation, it can be represented as the output variable Y which is a function of input variable X
During the training phase of supervised machine learning both X and Y remains unknown. The algorithm tries to find out the mapping function which can predict the Y most precisely.
Example of Supervised machine learning
You must have come across the term pattern recognition from any online or offline source. This is a kind off buzz word today and is in use to make our life more sophisticated and comfortable. Starting from a very simple application like your smartphone’s face recognition or handwriting recognition to advance use of cancer cell detection, this supervised learning is the essence of pattern recognition.
Its simple applications are already making our lives easier be it your smartphone’s face lock feature, handwriting recognition or your voice recognition. The auto-driving car concept also heavily depends on supervised learning concept. In every sector of the industry, you can find presence of this theory nowadays.
An application in agriculture
Now to understand how this system works we will take an example of its application in the agriculture field.
Prediction for the crop yield well before its harvesting is very essential for proper policy planning. It helps the government to fix its price, to provide better storage of the produce and farmers also able to plan its marketing channels if there is a precise prediction about how much production is expected.
Now crop yield is determined by several factors, some of them are physical parameters of the crop itself like crop height, number of tillers etc. weather parameters like rainfall, humidity, sunshine hours etc. other than these soil health factors like carbon balance, organic matters and several others play an important role and contribute to the ultimate yield.
Now if we have a sufficient amount of labelled data that is a set of data which has all these independent variables affecting the yield along with the corresponding yield, we can train the algorithm with this training dataset. So, it will be supervised learning. As if the learning process has been supervised by any teacher.
The learning process stops only when a robust model is achieved and the prediction is of an acceptable level.
A real-world problem solved by Supervised Machine learning
Here I am going to cite an example of supervised learning in modern research and how it is being used to address complex problems of the real world.
A Project work was taken up by a group of scientists to identify the endangered species of Mojave desert of California. The main objective of the study was to locate the two threatened species Mohave Ground Squirrel and desert tortoise of the area by analyzing images captured by smartphones.
The challenge faced by the biologists was to track and rescue these two endangered species as they were very tough to spot. Nature has given them such a capability to camouflage with the desert background and vegetation that it becomes almost impossible for the human eye to see them.
So here the scientists used computer vision and develop a machine learning algorithm to identify the pattern, distinguish it from the desert backdrop and classify them according to the characteristics.
Types of supervised machine learning
There are two main categories of supervised machine learning.
It is applicable when the variable in hand is a categorical variable and the objective is to classify it. If the algorithm classifies into two classes, it is called binary classification and if the number of classes is more than two, then it is called multiclass classification.
In the given figure, a binary classification has been demonstrated. Here a group of people has been classified according to their genders depending on a dataset consisting their height and weight.
The task is done in the same way as discussed before. First of all, the algorithm is trained with a dataset with an assigned category. Then based on this training the algorithm has categorized the values when provided with an input data.
Example of classification
A most common example of classification problem is identifying if a new mail is a spam or not spam, identifying loan defaulters also a problem of classification.
The algorithm is provided with a dataset of mails and a corresponding column indicating if it is a spam or not spam. Similarly, a list is first provided with the customers labelled with if they are a loan defaulter or not to train the algorithm. Then the supervised learning model is used to identify the type of customer from an independent input dataset.
There are a number of algorithms for classification. The most popular ones are
- Naive Baye’s theorem
- Linear classifier
- Support vector machine
- Random forest
- Decision tree
- K-Nearest neighbour
Regression is a statistical process which tries to find out the relationship between the dependent and independent variables. The major difference with classification is that in regression we deal with continuous variables.
If a regression equation is a linear one between the independent and dependent variables then it is a simple linear regression equation. If the regression equation of Y on X is linear, then it does not necessarily suggest that the regression equation of X on Y is also linear and vice-versa. The dependent variable a function of independent variables with respective constant parameters and an error term which is again a random variable. A regression model has the expression:
Y=f 0,1,2,…, n+ϵ
Where Y is the dependent variable, X1, X2+…Xn are independent variables, 0,1,2,…, n are the regression coefficients and is the error term and normally distributed with mean 0 and variance 2. This type of regression model is also known as a deterministic model.
Example of regression
An example of simple linear regression can be regressing the weight of a group of people on the basis of their height. Here Height and weight are the independent and dependent variable respectively. As a person height determines his weight, not the vice versa.
The blue line in the above figure is the regression line fitted with a supervised machine learning technique. This represents the best-fitted line obtained through a rigorous training process until a robust model with acceptable accuracy is achieved.
To perform regression a number of algorithms are used by researchers. The most frequently used ones are:
- Simple linear regression
- Multiple linear regression
- Logistic regression
- Polynomial regression etc.