Data Science

Masterclass on Supervised vs Unsupervised Machine Learning with Dr Susan Mulcahy

Machine learning has become central to many modern-day business and consumer applications. Notably, it provides us with the building blocks to develop smart systems and Artificial Intelligence-driven solutions.

One of the best examples of machine learning at work is popular media streaming platforms like YouTube and Spotify. They learn from our streaming patterns and behavior and recommend music and videos that we are more likely to enjoy.

These are implementations of a predictive system or engine which are made possible with the help of machine learning. 

To understand more about it, we recently hosted a masterclass with Dr.Susan Mulcahy on supervised vs unsupervised machine learning for our learner cohort of the Integrated Program in Data Science, Machine Learning, and Artificial Intelligence

Dr. Susan Mulcahy is a highly respected researcher and a faculty member from the Imperial College of London. She is also the Director of its Data Sparks program – a data science program involving real-world projects for postgraduate students. This program is hosted at Imperial College of London’s Data Science Research Center. 

Dr. Mulcahy has been the Senior Education Fellow of the Data Science Institute (DSI) at Imperial College London and is a well-known data analytics lecturer. She has received her data-driven PhD from Imperial College London’s Bioengineering Department. 

Types of Machine Learning Techniques

Dr. Mulcahy explained that when it comes to the kinds of machine learning that we have available to us, there are four of them. These include:

  • Supervised machine learning – Supervised learning, as the name indicates, has the presence of a supervisor as a teacher. Supervised learning is when we teach or train the machine using well-labelled data. This means some data is already tagged with the correct answer. It involves constant human assistance and configuring. 
  • Semi-supervised machine learning –  In this type of learning, the algorithm is trained upon a combination of labeled and unlabelled data. Typically, this combination will contain a tiny amount of labeled data and a considerable amount of unlabelled data. 
  • Unsupervised machine learning – Here, the task of the machine is to group unsorted information according to similarities, patterns, and differences without any prior training of data. There are no labeled data and answer keys while the machine carries out the entire training process through trial and error. This learning methodology requires the machine to start from scratch as there are no reference points, end values, or example datasets. 
  • Reinforcement Learning – This involves providing the end values, parameters, and path, after which it is up to the machine to develop the most optimal combinations. Reinforcement learning depends on the machine’s ability to improve itself through trial and error. 

“Supervised machine learning requires labeled data. In this case, it’s all of the images of vegetables, with the names of the vegetables on them. Unsupervised machine learning does not use labeled data; it does not use local data. It is only wanting to group similar items. It doesn’t matter what they’re called.”

Machine learning has found increased adoption in the past decade. Many manual systems and workflows have been replaced by automated systems powered by machine learning principles and techniques. 

The standard machine learning techniques used today are:

  • Classification –  Classification is a process of categorizing a given data set into classes. It can be performed on both structured and unstructured data. 
  • Clustering – Clustering or cluster analysis is a machine learning technique which groups the unlabelled dataset. It can be defined as a way of grouping the data points into different clusters consisting of similar data points. The objects with the possible similarities remain in a group with less or no similarities with another group. 
  • Regression – Regression is a technique for investigating the relationship between independent variables or features and a dependent variable or outcome. 
  • Dimensionality Reduction – Dimensionality reduction is the task of reducing the number of features in a dataset. 
  • Forecasting – ML forecasting algorithms often use techniques that involve more complex features and predictive methods, but the objective of ML forecasting methods is the same as that of traditional methods – to improve the accuracy of forecasts while minimizing a loss function. 
  • Artificial Neural Networks – Neural networks, also known as artificial neural networks (ANNs) or simulated neural networks (SNNs), are a subset of machine learning and are at the heart of deep learning algorithms. Their name and structure are inspired by the human brain, mimicking the way that biological neurons signal to one another.

Production, manufacturing, healthcare, defence, agriculture, aeronautics, and other industries are now using machine learning for daily tasks. Many of these tasks that depend on machine learning are central to the routine functioning of an entire business or worksite. 

The approach should be vastly different when working with unstructured data compared to structured data (data that can be copied and pasted in rows and columns). As a matter of fact, we might even be working with semi-structured data.

Let us learn what Dr Susan Mulcahy says about supervised and unsupervised machine learning.

Supervised vs Unsupervised Machine Learning

Here are the differences between supervised and unsupervised machine learning:

Supervised Machine Learning Unsupervised Machine Learning
Supervised learning involves results that are already expected.  Unsupervised learning involves the machine determining what is important or different in the dataset.
This learning methodology is more suitable for smaller datasets if there is already an example dataset. This learning methodology suits large volumes of data without any reference points.
Supervised learning only requires programming languages such as Python or R and their IDEs (Integrated Development Environments). Unsupervised learning requires robust tools capable of processing and working with large amounts of unstructured data.
Algorithms are trained using labeled data. Algorithms are used against data that is not labeled.

When choosing the right approach for machine learning, we must first determine the kind of data we are working with. For example, unsupervised machine learning can sometimes lead to inaccurate results without human intervention.

Some unclassified data in large datasets are also corrupted or incorrectly tagged, thus making it hard for machines to learn. However, in the case of supervised learning, there are many situations where we cannot prepare the dataset, which would negatively affect the ML model. 

These factors are essential for deciding the machine learning approach to take:

  • If the data is labeled or unlabeled.
  • The kind of algorithms that is the most suitable for the project.
  • How transparent is the data?
  • If the process will be continuous or limited to only the duration of the initial training process.
  • The data size and if the project is big data-based.

“Data usually is recording something or a piece of information, but the real question is whether it’s useful information or not, so that gets us to another level.”

Supervised learning models are great for spam detection and traffic forecasting systems, while unsupervised learning models are suitable for self-learning engines and recommendation systems. 

If you want to learn to deploy machine learning in your organisations effectively and efficiently, then join Hero Vired’s Integrated Program in Data Science, Machine Learning, and Artificial Intelligence and attend exciting masterclasses with leading experts like Dr. Susan Mulcahy.

Post a comment

Your email address will not be published.

Contact Us