Unsupervised Machine Learning: Clustering using K-Means Algorithm

K-Means Algorithm

The unsupervised learning looks for previously undetected pattern/insight with minimum supervision in a dataset with no pre-existing labels such as unstructured data.
K-means clustering is a type of unsupervised learning.
The goal of K-means algorithm is to find groups in data, with the number of groups represent by the variable K.
The algorithm works iteratively to assign each data point to one of K groups based on the features provided. Data points are clustered based on features similarity. The result of the K-means algorithm are:

The centroids of the K clusters, which can be used to label new data
Each data point is assigned to a single cluster

K-Means algorithm step summary:

Specify the number of clusters, K, need to be generated by this algorithm
Randomly select K data points and assign each data point to a cluster. In simple words, classify the data based on the number of data points
Compute the cluster centroid
Keep iterating the following until we find optimal centroid which is the assignment of data points to the clusters that are not changing anymore

4.1. Compute the sum of squared distance between data points and centroids.
4.2. Assign each data point to the cluster that is closer than other cluster (centroid)
4.3. Compute the centroids for the clusters by taking the average of all data points of that cluster.

Choosing the right K

The Elbow Method

WCSS (Within Cluster Sum of Squares) is a parameter used to determine the right K. The aim is to determine the optimum number of clusters when there is no significant decrease anymore in WCSS.

WCSS formula when K=3 is represented as:

where summation distance (p,c) is the sum of distance of points in a cluster from the centroid.

In the picture below, there is no significant decrease in WCSS after 3 clusters. Besides, there is an elbow shape that forms around the number of clusters=3. In this particular case, K=3 is the best choice based on the elbow method.

Unsupervised Machine Learning: Clustering using K-Means Algorithm

K-Means Algorithm

K-Means algorithm step summary:

Choosing the right K

The Elbow Method

Application

Leave a Reply Cancel reply