- TWO ANGLES: GRAPH PARTITIONING AND COMMUNITY DETECTION.
Graph Partitioning is the problem of dividing the vertices of a network into a given number of non-overlapping groups of given sizes such that the number of edges between groups is minimized. The important point here is that the number and sizes of the groups are fixed. Sometimes the sizes are only fixed roughly—within a certain range, for instance—but they are fixed nonetheless.
Community detection problems differ from graph partitioning in that the number and
size of the groups into which the network is divided are not specified by the experimenter. Instead they are determined by the network itself: the goal of community detection is to find the natural fault lines along which a network separates. The sizes of the groups are not merely unspecified but might in principle vary widely from one group to another. A given network might divide into a few large groups, many small ones, or a mixture of all different sizes.
We will not delve into the Graph Partitioning Algorithms, however the references[1] are attached at the end for readers' benefit.
2. COMMUNITY DETECTION TECHNIQUES
In this blog, I will be discussing about Hierarchical Clustering Algorithms, that is
- Agglomerative Approach (Bottom-Up Approach)
- Divisive Approach (Top-Down Approach)
Following Picture perfectly differentiates between the two methods.
- Agglomerative Method
It starts with each node/point being a single cluster
and eventually all nodes/points belong to the same cluster.
Similarity Measures w[i][j] for Nodes: Algorithm Ingredients
Let A be the adjacency matrix of the network, i.e.
A[i][j] = { 1 if
(i, j) ∈ E
{ 0, otherwise.
Different Similarity Measures are as follows:
Similarity Measures for the set of Nodes are described below:
- Basic Algorithm
1. Assign each node to its own cluster
2. Find the cluster pair with highest similarity and join them
together into a cluster
3. Compute new similarities between new joined cluster and
others
4. Go to step 2 until all nodes form a single cluster
- Divisive Method
It start with all nodes/points belong to the same cluster and
eventually each node forms a cluster on its own.
Edge between-ness: The between-ness of an edge is the number of shortest-paths in the
network that pass through that edge
It uses the idea that “bridges” between communities must have
high edge between-ness
The Girvan-Newman algorithm (Pseudo-code)
1. Compute betweenness for all edges in the network.
2. Remove the edge with highest betweenness.
3. Go to step 1 until no edges left.
Now, the quality of the results of these algorithms can be calculated using the measure called
Modularity.
And we can work upon that feature to improve/modify our previous algorithms. We will discuss those in the follow-up post of this particular subtopic.
And we can work upon that feature to improve/modify our previous algorithms. We will discuss those in the follow-up post of this particular subtopic.
References:
1. Networks: An Introduction - Mark Newman.
2. Community detection in graphs
Santo Fortunato∗
Complex Networks and Systems Lagrange Laboratory, ISI Foundation, Viale S. Severo 65, 10133, Torino,
I-ITALY
That was a good one! A good follow up on the class lecture as well
ReplyDelete