Skip to content

My Personal Blog

Machine Learning and beyond

Menu
  • Artificial Intelligence
    • GenAI, Agentic Workflows & Knowledge Base with Amazon Bedrock
    • Building RAG Machine Learning Template on AWS
    • Reinforcement Learning
    • Computer Vision – AWS Rekognition
    • AWS Sagemaker AI Studio User Journey
    • MLOps Orchestrator with AWS Sagemaker AI
  • MACHINE LEARNING
    • Python for Data Science: Unsupervised Learning Algorithm
    • Python for Data Science: Supervised Learning Algorithm
    • Python for Data Science: Machine Learning
    • Supervised Machine Learning: Student Performance analysis using supervised learning algorithm
    • Unsupervised Machine Learning: Clustering using K-Means Algorithm
    • Unsupervised Machine Learning: Image compression using K-Means Cluster Algorithm
    • Unsupervised Machine Learning: Image compression using K-Means Cluster Algorithm
  • adventures
    • snowdonia-wales
    • Santini600
  • TRIATHLON
    • 2019 Ironman 70.3 Asia Pacific Championship, Vietnam
    • Race Report: 2019 Ironman Western Sydney
    • 2017 Ironman 70.3 Bintan
    • 2017 Perth Marathon
    • 2016 Ironman 70.3 Western Sydney
  • About Me
  • Let’s Connect
Menu

Unsupervised Machine Learning: Clustering using K-Means Algorithm

Posted on June 21, 2020August 14, 2025 by pluto gasanova

K-Means Algorithm

The unsupervised learning looks for previously undetected pattern/insight with minimum supervision in a dataset with no pre-existing labels such as unstructured data.
K-means clustering is a type of unsupervised learning. 
The goal of K-means algorithm is to find groups in data, with the number of groups represent by the variable K.
The algorithm works iteratively to assign each data point to one of K groups based on the features provided. Data points are clustered based on features similarity. The result of the K-means algorithm are:

  • The centroids of the K clusters, which can be used to label new data
  • Each data point is assigned to a single cluster

K-Means algorithm step summary:

  1. Specify the number of clusters, K, need to be generated by this algorithm
  2. Randomly select K data points and assign each data point to a cluster. In simple words, classify the data based on the number of data points
  3. Compute the cluster centroid
  4. Keep iterating the following until we find optimal centroid which is the assignment of data points to the clusters that are not changing anymore
  • 4.1. Compute the sum of squared distance between data points and centroids.
  • 4.2. Assign each data point to the cluster that is closer than other cluster (centroid)
  • 4.3. Compute the centroids for the clusters by taking the average of all data points of that cluster.

Choosing the right K

The Elbow Method

WCSS (Within Cluster Sum of Squares) is a parameter used to determine the right K. The aim is to determine the optimum number of clusters when there is no significant decrease anymore in WCSS.

WCSS formula when K=3 is represented as:

where summation distance (p,c) is the sum of distance of points in a cluster from the centroid.

In the picture below, there is no significant decrease in WCSS after 3 clusters. Besides, there is an elbow shape that forms around the number of clusters=3. In this particular case, K=3 is the best choice based on the elbow method. 

Application













Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • GenAI, Agentic Workflows & Knowledge Base with Amazon Bedrock
  • About Me
  • Let’s Connect
  • Hiking to Snowdon Summit – Spring 2025
  • Santini 600 : Cycling 600 km in 24 hours for a Good Cause

Archives

  • August 2025
  • June 2025
  • May 2025
  • March 2022
  • June 2020
  • May 2020
  • November 2019
  • June 2019
  • September 2017
  • July 2017
  • December 2016
© 2025 My Personal Blog | Powered by Superbs Personal Blog theme