Unsupervised Machine Learning: Social Network Analysis with Graph Analytics

Unsupervised Learning:

Social Network Analysis with Graph Analytics

A picture speaks a thousand words is one of the most commonly used phrases. But a graph speaks so much more than that. A visual representation of data, in the form of graph, helps us gain actionable insights and make better data driven decisions based on them.
But to truly understand what graphs are and why they are used, we will need to understand a concept known as Graph Theory.

A Graph G = (V, E) consists of a finite set of vertices or nodes (V) and set of Edges (E) which connect a pair of nodes.

In the above Graph, the set of vertices V = {0,1,2,3,4,5} and the set of edges E = {01,02,12,23,25,34,30,40}.

Weighted Graph

Weighted Graph is a graph with weighted edges.

Each node or edge can hold different attributes, e.g., nodes can be assigned to different types like crossings or dead-ends and edges might have a certain numerical attribute like a speed limit. Edges attributes, in the case of numerical attributes, are called weights.

Network Influencer

Above, we learned some of the network distance measures. They are useful in knowing how the information will spread through the network. In this section, we will learn how to find the most important nodes (individuals) in the network. These parameters are called Centrality Measures.

Degree Centrality

Degree Centrality is the number of edges connected to a node. The In-degree and the Out-degree of a node, simply measures how many edges are leaving a node and how many edges are coming in.

Closeness Centrality for a node

Closeness centrality for a node is the average length of the shortest path from the node to all other nodes.

Betweenness Centrality

Betweenness Centrality for a node is the number of times the node is present in the shortest path between 2 other nodes

where ∈𝑥𝑦 is the total number of shortest paths between nodes x and y while 𝜖𝑥𝑦(𝑣) is the number of those paths that pass through v.

Clustering Coefficient

Clustering Coefficientis a measure of the degree to which nodes in a graph tend to cluster together.

The local clustering coefficient of the blue node is computed as the proportion of connections among its neighbours compared with the number of all possible connections.
In the figure, the blue node has three neighbours, which can have a maximum of 3 connections among them. In the first figure all three possible connections are realised (thick black segments), giving a local clustering coefficient of 1. In the middle part of the figure only one connection is realised (thick black line) and 2 connections are missing (dotted red lines), giving a local cluster coefficient of 1/3. Finally, none of the possible connections among the neighbours of the blue node are realised, producing a local clustering coefficient value of 0.

Graph Analytics with Python

Graph Analytics with Python Use Case: Facebook Data

Social media influencers are those individuals who have a loyal following of users, and they achieve a high level of engagement on their content, such as images, blogs, posts, videos, etc. Usually, these influencers are viewed as experts in their domains, have a high level of convincing power, and can easily persuade others.
Discovering influencers on social media is becoming increasingly important. The benefits that come along with it are amazing. It is useful for tasks like viral marketing, product promotion, behavior adoption and even analyzing epidemic spreading.

For a small brand, finding a social media influencer with thousands of loyal followers to promote their products is much more economical and fruitful than spending their advertising budget on billboards or TV ads.

It is easy to find a popular non-celebrity social media user. However, we also can’t ignore the fact that there are many social media users with an audience of about 1,000 to 100,000 who have achieved recognition in their respective fields. Even though their following is not big, they can collectively influence the behavior and decision making of a large number of people.

Graphs are also used in social networks like Facebook. For example, in Facebook, each person is represented with a vertex(or node). Each node is a structure and contains information like person id, name, gender, locale etc.

We will identify some influencers from the data set used in this use case. The dataset in the use case is downloaded from Stanford University.

This dataset consists of ‘circles’ (or ‘friends lists’) from Facebook. Facebook data was collected from survey participants using this Facebook app. The dataset includes node features (profiles), circles, and ego networks.

Dataset statistics
Nodes	4039
Edges	88234
Nodes in largest WCC	4039 (1.000)
Edges in largest WCC	88234 (1.000)
Nodes in largest SCC	4039 (1.000)
Edges in largest SCC	88234 (1.000)
Average clustering coefficient	0.6055
Number of triangles	1612010
Fraction of closed triangles	0.2647
Diameter (longest shortest path)	8
90-percentile effective diameter	4.7

Display the Influencer users in Facebook as Graph

Each user is represented by a dot. The size of dots reflects the influencer status. The bigger the dots the more he/she is classified as a Influencer.