Machine learning (ML) is one of the most groundbreaking fields of science. It is a technology which gives new capacities to today’s computers – they can gather and process information in a way that mimics human learning abilities.
Machine learning is related to the notion of artificial intelligence (AI). What is the difference? Artificial intelligence is a wider notion and concerns artificial performance of all kinds of tasks which are characteristic of human intelligence. Among them are not only understanding speech or recognizing objects and sounds, but also learning new things, planning one’s actions and solving existing problems. The issues of artificial intelligence are currently focused around AI capabilities of machines, or the aforementioned machine learning. It can be said that machine learning is the way to achieve artificial intelligence.
Working with machine learning is similar to the work of a scientist carrying out experiments, who needs a laboratory equipped with the right tools and data, or research material, to perform their tasks. It is hard to imagine the work of a machine learning specialist without data visualization tools. However, machine learning specialists’ main task is to create and use learning algorithm modules. If the modules are correctly combined into learning models, success can be achieved. Learning can take long – sometimes computers are left on for many nights for the models to learn what they should.
We distinguish three approaches to machine learning. The difference between them is how (on the basis of what data) the machines learn:
- supervised learning – the right answers are known
- reinforcement learning – the value of the reward for an answer is known
- non-supervised learning – the right answers are not known
For an experienced specialist, the choice of the right approach for the particular problem they are facing is usually obvious.
Supervised learning, or learning with a teacher
An example of supervised learning is forecasting. The word “supervised” means that the model refers to a known set of samples – learning data, from which the model is built. Such an approach is called learning with a teacher, because it is similar to school learning. In such a case, a model (e.g. a neural network) is the “student” who modifies their structure if they don’t provide a good enough answer. If continuous values are predicted, we call it regression, and in the case of predicting particular classes (discrete values), we are talking about classification.
Classification is automatic determination of whether a representative belongs to a particular class on the basis of information gathered before. The training data are marked with appropriate labels which specify the class they belong to. During learning, the model modifies its structure so as to achieve a correct answer. When it finishes learning, it is possible to determine which class new samples belong to. An example of a classification task consisting in recognizing numbers is presented in the image below (Fig. 1.).
Fig. 1. Example of a classification task consisting in recognizing numbers
Classification can be used e.g. to differentiate between spam and useful e-mail messages. Another example of its application is in the field of telemedicine – automatic notification of the emergency service when the algorithm, analyzing data incoming from sensors worn by a patient, classifies their state of health as life-threatening. Classification can also recognize a greater number of classes, e.g. 10 types of digits (0-9) on the basis of handwriting.
A classification task can be carried out in various ways, e.g. by using artificial neural networks, deep networks (deep learning), or support vector machines (SVM).
The purpose of regression is to predict continuous values rather than discrete ones (like in classification). Regression is a data approximation method and an example of supervised learning. A classic regression algorithm consists in creating a model in such a way that mean squared error is minimized for values known to the algorithm. When speaking about calculating regression, we usually mean the moment of creating such a model – it can be e.g. determining a function (Fig. 2 presents an example with “connecting the dots”). Regression can also be determined by creating a neural network. Such a model can predict values for unknown data.
Fig. 2. Example of linear regression consisting in determining the straight red line.
It works with e.g. the value of stocks on the stock exchange or air humidity. Another example of how regression can be used is the optimization of power generation parameters in a wind power plant. In such a case, regression consists in calculating the most effective settings of the turbine, depending on weather conditions.
Regression can use neural networks, determination of a mathematical function’s parameters, but also KNN.
Another method of machine learning is learning by reinforcement. With such an approach, a system or agent is created and learns by interacting with its environment. It usually does so by assessing the state of the environment, and often also receives a reward signal. Because of such a mode of operation, this kind of learning is sometimes considered a type of supervised learning. The agent performs a series of operations with the aim of maximizing the reward signal it will get. Reinforced learning can be used when teaching robots or planning game strategies. An example of such an approach can be training an agent to play chess – it chooses the movements of chess pieces, and the reward signal can be understood as winning or losing the game. Another example of using reinforced learning is optimizing network traffic management in order to maximize its throughput.
Reinforced learning is usually carried out by neural networks created for that purpose, including deep neural networks for deep learning.
Non-supervised learning is different from the previously mentioned approaches in that it is a type of learning without knowing the right answer. In this case, neither the reward function nor the correct labels are known. The incoming data is not tagged, and often even their structure is unknown. In such a case, the model itself determines the way to treat the incoming data. Among non-supervised methods, we can distinguish clustering and dimension reduction.
Clustering consists in automatic division of datasets into groups. Contrary to classification, the groups are not known – it is the algorithm who is responsible for finding them. Usually the algorithm takes the number of groups as a parameter and searches for features which differentiate the data. Another way of clustering is analysis of density. Data which are close to each other are assigned to one group. With such an approach, the algorithm itself finds the appropriate number of groups. For illustrating it better, an example result of such clustering is visible below (Fig. 3). A possible use of clustering is not only detecting objects in a picture, but also detecting unusual behaviours indicating e.g. that a credit card was intercepted by an intruder or that a system is being broken into, which makes it possible to prevent the attack from succeeding.
A popular clustering algorithm is one based on K-Nearest Neighbours (KNN).
Some datasets have a particularly large number of describing features. Dimension reduction is a set of techniques which allow removing features which do not provide any additional information from the dataset. Too many input data only generate information noise. In such a case, eliminating certain features may be helpful. It usually consists in removing features which are dependent on each other, because in such a case each of them provides the same information as both of them.
Such an analysis is often necessary at the first stage, in order to decrease the amount of information to save, transfer, or to analyze the incoming data. An example of using this non-supervised learning method is image recognition. The image itself, without dimension reduction, is described by a number of features equal to the number of pixels, which is, after all, dependent on the scale. The problem of excessive amount of information appears also when analyzing signals from different wearable sensors or machines in a factory. Dimension reduction is also useful while searching a DNA sequence for genes carrying information about a disease among numerous others which do not carry such information.
The PCA algorithm is often used for dimension reduction.
Machine learning can be carried out in a number of ways, and the choice of the right one is dictated by the particular task. Taking the general approach – supervised, non-supervised or reinforced learning – depends on how much information about the data we have and what we would like to get from it. Usually, several combinations of algorithms are tested, and an approach which will prove to be the most efficient in solving the given problem is sought.