Monday 7 March 2016

Multi Ego-Centered Community


Community can be defined as a set of nodes which are more related to each other than the other nodes in the network. A group of users, in a social network, belong to one community if they share some common interests or have similar behavior or characteristics. But, it is possible that a user may belong to more than one communities. This leads to overlapping communities. Computing these overlapping communities becomes difficult.

A method to identify communities is to take into account the local community of a node. This is known as an ego-centered community. In this, one focuses on finding out the groups related to a particular node. Here, again a problem arises when a node belongs to multiple communities. For such instances, we consider a community as just a set of nodes which are highly connected to each other and do not take into account the outgoing edges for a node. The optimization of quality function for such ego-centered networks is possible only for small communities and becomes difficult for larger ones.

Another approach is the use of multi ego-centered community, which identifies the communities centered at a particular set of nodes. Given a set of nodes, it tries to identify the other nodes of that community. This kind of technique is very helpful for identifying nodes related to a specific topic in large scale online networks, such as Facebook, Twitter, Wikipedia, etc.

Measure of similarity for nodes:
We have come across a number of similarity measures till date, but they generally have one of the following problems:
i) too restrictive
ii) parametric
iii) slow for large graphs

The node similarity measure used in ego-centered community detection is known as carryover opinion metric. It is a metric which takes the entire depth of the graph into consideration and is non-parametric.

Algorithm for calculating the node similarity through carryover opinion is as follows:
i) Start with a source (the node of interest)
ii) The opinion of source node is initialized as 1 and 0 for all the other nodes
iii) Select one of the neighbors of source node and calculate the average of their opinions
iv) Reset the opinion value of source node to 1
v)  Repeat the steps ii) to v) till all the nodes are exhausted

This process ensures that the opinion value of the source node remains 1 throughout and the nodes closer to the source converge faster. The more the speed, the more is the similarity of the node in consideration with the source node. The following conclusions can be drawn from the above process:
     a) After sufficiently large number of iterations, the rank of the nodes according to their opinion does not change.
     b) Also, the difference between the opinions of two nodes decreases proportionally with the difference between any other two nodes

These conclusions can be mathematically expressed as follows:
Consider any four nodes a, b, c, d. The opinion of these nodes at any iteration t be Ota ,  Otb,  Otc,  Otd respectively. So,
                                


Where, Ca,b,c,d is a constant depending on a, b, c, d.

So, the opinion is re-scaled every time (at each iteration) , the lowest value being 0 and the range being 0 to 1. Since, the opinion value converges to a fixed point after a number of iterations, the final value obtained after convergence is known as the carryover opinion. The proximity of nodes with the source node is captured with the help of the re-scaling step.

The following figure is an example of the similarity or proximity calculation, which shows the carryover pinion score on two overlapping graphs with 110 nodes.

                       Fig. Result of carryover opinion (a proximity score) on two overlapping graphs


Multi ego-centered community detection:

The idea behind multi ego-centered community detection is that if there are two nodes n1 and n2, such that n1 belongs to one community and n2 belongs to another community and there exists one other node n3 which belongs to both of these communities, then n3 should be similar to both n1 and n2. The similarity is calculated through the measure described above, the carryover opinion metric.

The procedure is as follows:
i) Calculate the similarity of all nodes with node n1 as well as n2.
ii) For set {n1, n2}, the similarity of all nodes is the geometric mean of the similarities of all nodes with n1 and with n2.

The procedure can be extended for more number of nodes being part of the initial set.

Thus, the metric carryover opinion is an efficient and parameter-free measure used for calculating similarities between two nodes and is also helpful to detect a unique community by just having a small set of nodes. The complexity for this metric is O(te), where e is the number of edges and t is relatively small.

Reference:
[1] Danisch, Maximilien, Jean-Loup Guillaume, and Bénédicte Le Grand. "Unfolding ego-centered community structures with “a similarity approach”." Complex Networks IV. Springer Berlin Heidelberg, 2013. 145-153.

  

No comments:

Post a Comment