Image compression using K-Means Cluster Algorithm
Most of us are used to working with structured data that fits neatly within fixed row and columns in relational database and spreadsheet as the examples.
However, more than 90% of data generated today is considered unstructured, and this number will continue to rise with the prominence of Internet of things. Examples of unstructured data include social media sites, satellite imagery, surveillance imagery, webpages, blogs, video files, audio files, text files, Call center transcripts/recording, etc
The unsupervised learning looks for previously undetected pattern/insight with minimum supervision in a dataset with no pre-existing labels such as unstructured data.
Therefore, there are wide open implementation of unsupervised learning on the unstructured data given the sheer volume of unstructured data in our life.
Clustering is one of the Unsupervised learning technique beside Association.
One interesting application of clustering is in color compression within images.
An image is stored in three-dimensional array of size (height, width, RGB), containing red/blue/green combination as integers from 0 to 255.
One way we can view this set of pixels is as a cloud of points in a three-dimensional color space. We will reshape the data to [n_samples x n_features], and rescale the colors so that they lie between 0 and 1:
Now, let’s reduce these million colors to just 30 colors, using a k-means clustering accross the pixel space.
Some detail is certainly lost in the image on the right side, but the overall image is still easily recognizable. While this is an interesting application of k-means, there are certainly better way to compress information in images. But the example shows the power of thinking outside of the box with unsupervised methods like k-means.