Tuesday 9 February 2016

DIVIDING NETWORKS INTO CLUSTERS


  1.  TWO ANGLES: GRAPH PARTITIONING AND COMMUNITY DETECTION.


Graph Partitioning is the problem of dividing the vertices of a network into a given number of non-overlapping groups of given sizes such that the number of edges between groups is minimized. The important point here is that the number and sizes of the groups are fixed. Sometimes the sizes are only fixed roughly—within a certain range, for instance—but they are fixed nonetheless.

Community detection problems differ from graph partitioning in that the number and
size of the groups into which the network is divided are not specified by the experimenter. Instead they are determined by the network itself: the goal of community detection is to find the natural fault lines along which a network separates. The sizes of the groups are not merely unspecified but might in principle vary widely from one group to another. A given network might divide into a few large groups, many small ones, or a mixture of all different sizes.


We will not delve into the Graph Partitioning Algorithms, however the references[1] are attached at the end for readers' benefit.


   2. COMMUNITY DETECTION TECHNIQUES

In this blog, I will be discussing about Hierarchical Clustering Algorithms, that is 
             - Agglomerative Approach (Bottom-Up Approach)
             - Divisive Approach (Top-Down Approach)

Following Picture perfectly differentiates between the two methods.



  • Agglomerative Method

It starts with each node/point being a single cluster and eventually all nodes/points belong to the same cluster.


Similarity Measures w[i][j] for Nodes: Algorithm Ingredients

Let A be the adjacency matrix of the network, i.e. 
                                       
                                           A[i][j] = { 1 if (i, j) ∈ E
                                                          { 0, otherwise.

Different Similarity Measures are as follows:







Similarity Measures for the set of Nodes are described below:

  • Basic Algorithm

                       1. Assign each node to its own cluster 
                       2. Find the cluster pair with highest similarity and join them together into a cluster 
                       3. Compute new similarities between new joined cluster and others 
                       4. Go to step 2 until all nodes form a single cluster








  • Divisive Method

It start with all nodes/points belong to the same cluster and eventually each node forms a cluster on its own.

Edge between-ness:  The between-ness of an edge is the number of shortest-paths in the network that pass through that edge It uses the idea that “bridges” between communities must have high edge between-ness


The Girvan-Newman algorithm (Pseudo-code) 

                      1. Compute betweenness for all edges in the network.
                      2. Remove the edge with highest betweenness.
                      3. Go to step 1 until no edges left.

Now, the quality of the results of these algorithms can be calculated using the measure called 
Modularity.
And we can work upon that feature to improve/modify our previous algorithms. We will discuss those in the follow-up post of this particular subtopic.


References:

1. Networks: An Introduction - Mark Newman.
2. Community detection in graphs Santo Fortunato∗ Complex Networks and Systems Lagrange          Laboratory, ISI Foundation, Viale S. Severo 65, 10133, Torino, I-ITALY


 

1 comment:

  1. That was a good one! A good follow up on the class lecture as well

    ReplyDelete