Skip to content

My Personal Blog

Machine Learning and beyond

Menu
  • Artificial Intelligence
    • GenAI, Agentic Workflows & Knowledge Base with Amazon Bedrock
    • Building RAG Machine Learning Template on AWS
    • Reinforcement Learning
    • Computer Vision – AWS Rekognition
    • AWS Sagemaker AI Studio User Journey
    • MLOps Orchestrator with AWS Sagemaker AI
  • MACHINE LEARNING
    • Python for Data Science: Unsupervised Learning Algorithm
    • Python for Data Science: Supervised Learning Algorithm
    • Python for Data Science: Machine Learning
    • Supervised Machine Learning: Student Performance analysis using supervised learning algorithm
    • Unsupervised Machine Learning: Clustering using K-Means Algorithm
    • Unsupervised Machine Learning: Image compression using K-Means Cluster Algorithm
    • Unsupervised Machine Learning: Image compression using K-Means Cluster Algorithm
  • adventures
    • snowdonia-wales
    • Santini600
  • TRIATHLON
    • 2019 Ironman 70.3 Asia Pacific Championship, Vietnam
    • Race Report: 2019 Ironman Western Sydney
    • 2017 Ironman 70.3 Bintan
    • 2017 Perth Marathon
    • 2016 Ironman 70.3 Western Sydney
  • About Me
  • Let’s Connect
Menu

Unsupervised Machine Learning: Social Network Analysis with Graph Analytics

Posted on June 14, 2020August 14, 2025 by pluto gasanova

Unsupervised Learning: 

Social Network Analysis with Graph Analytics

A picture speaks a thousand words is one of the most commonly used phrases. But a graph speaks so much more than that. A visual representation of data, in the form of graph, helps us gain actionable insights and make better data driven decisions based on them.
But to truly understand what graphs are and why they are used, we will need to understand a concept known as Graph Theory.

A Graph G = (V, E) consists of a finite set of vertices or nodes (V) and set of Edges (E) which connect a pair of nodes. 

In the above Graph, the set of vertices V = {0,1,2,3,4,5} and the set of edges E = {01,02,12,23,25,34,30,40}.


Weighted Graph

Weighted Graph is a graph with weighted edges.

Each node or edge can hold different attributes, e.g., nodes can be assigned to different types like crossings or dead-ends and edges might have a certain numerical attribute like a speed limit. Edges attributes, in the case of numerical attributes, are called weights.


Network Influencer

Above, we learned some of the network distance measures. They are useful in knowing how the information will spread through the network. In this section, we will learn how to find the most important nodes (individuals) in the network. These parameters are called Centrality Measures.


Degree Centrality

Degree Centrality is the number of edges connected to a node. The In-degree and the Out-degree of a node, simply measures how many edges are leaving a node and how many edges are coming in.

Closeness Centrality for a node

Closeness centrality for a node is the average length of the shortest path from the node to all other nodes. 

Betweenness Centrality

Betweenness Centrality for a node is the number of times the node is present in the shortest path between 2 other nodes

where βˆˆπ‘₯𝑦 is the total number of shortest paths between nodes x and y while πœ–π‘₯𝑦(𝑣) is the number of those paths that pass through v.

Clustering Coefficient

Clustering Coefficientis a measure of the degree to which nodes in a graph tend to cluster together. 

                      


The local clustering coefficient of the blue node is computed as the proportion of connections among its neighbours compared with the number of all possible connections.
In the figure, the blue node has three neighbours, which can have a maximum of 3 connections among them. In the first figure all three possible connections are realised (thick black segments), giving a local clustering coefficient of 1. In the middle part of the figure only one connection is realised (thick black line) and 2 connections are missing (dotted red lines), giving a local cluster coefficient of 1/3. Finally, none of the possible connections among the neighbours of the blue node are realised, producing a local clustering coefficient value of 0.


Graph Analytics with Python


Graph Analytics with Python Use Case: Facebook Data















































Social media influencers are those individuals who have a loyal following of users, and they achieve a high level of engagement on their content, such as images, blogs, posts, videos, etc. Usually, these influencers are viewed as experts in their domains, have a high level of convincing power, and can easily persuade others.
Discovering influencers on social media is becoming increasingly important. The benefits that come along with it are amazing. It is useful for tasks like viral marketing, product promotion, behavior adoption and even analyzing epidemic spreading.


For a small brand, finding a social media influencer with thousands of loyal followers to promote their products is much more economical and fruitful than spending their advertising budget on billboards or TV ads.


It is easy to find a popular non-celebrity social media user. However, we also can’t ignore the fact that there are many social media users with an audience of about 1,000 to 100,000 who have achieved recognition in their respective fields. Even though their following is not big, they can collectively influence the behavior and decision making of a large number of people.


Graphs are also used in social networks like Facebook. For example, in Facebook, each person is represented with a vertex(or node). Each node is a structure and contains information like person id, name, gender, locale etc.


We will identify some influencers from the data set used in this use case. The dataset in the use case is downloaded from Stanford University.


This dataset consists of ‘circles’ (or ‘friends lists’) from Facebook. Facebook data was collected from survey participants using this Facebook app. The dataset includes node features (profiles), circles, and ego networks.

Dataset statistics
Nodes4039
Edges88234
Nodes in largest WCC4039 (1.000)
Edges in largest WCC88234 (1.000)
Nodes in largest SCC4039 (1.000)
Edges in largest SCC88234 (1.000)
Average clustering coefficient0.6055
Number of triangles1612010
Fraction of closed triangles0.2647
Diameter (longest shortest path)8
90-percentile effective diameter4.7


Display the Influencer users in Facebook as Graph

Each user is represented by a dot. The size of dots reflects the influencer status. The bigger the dots the more he/she is classified as a Influencer.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • GenAI, Agentic Workflows & Knowledge Base with Amazon Bedrock
  • About Me
  • Let’s Connect
  • Hiking to Snowdon Summit – Spring 2025
  • Santini 600 : Cycling 600 km in 24 hours for a Good Cause

Archives

  • August 2025
  • June 2025
  • May 2025
  • March 2022
  • June 2020
  • May 2020
  • November 2019
  • June 2019
  • September 2017
  • July 2017
  • December 2016
© 2025 My Personal Blog | Powered by Superbs Personal Blog theme