Community can be defined as a set of nodes which are more related to each other than the other nodes in the network. A group of users, in a social network, belong to one community if they share some common interests or have similar behavior or characteristics. But, it is possible that a user may belong to more than one communities. This leads to overlapping communities. Computing these overlapping communities becomes difficult.
A method to identify communities is to take into account the
local community of a node. This is known as an ego-centered community. In this, one focuses on finding out the
groups related to a particular node. Here, again a problem arises when a node
belongs to multiple communities. For such instances, we consider a community as
just a set of nodes which are highly connected to each other and do not take
into account the outgoing edges for a node. The optimization of quality
function for such ego-centered networks is possible only for small communities
and becomes difficult for larger ones.
Another approach is the use of multi ego-centered community, which identifies the communities
centered at a particular set of nodes. Given a set of nodes, it tries to
identify the other nodes of that community. This kind of technique is very
helpful for identifying nodes related to a specific topic in large scale online
networks, such as Facebook, Twitter, Wikipedia, etc.
Measure of similarity
for nodes:
We have come across a number of similarity measures till
date, but they generally have one of the following problems:
i) too
restrictive
ii) parametric
iii) slow
for large graphs
The node similarity measure used in ego-centered community
detection is known as carryover opinion
metric. It is a metric which takes the entire depth of the graph into
consideration and is non-parametric.
Algorithm for calculating the node similarity through
carryover opinion is as follows:
i) Start
with a source (the node of interest)
ii)
The
opinion of source node is initialized as 1 and 0 for all the other nodes
iii) Select
one of the neighbors of source node and calculate the average of their opinions
iv) Reset
the opinion value of source node to 1
v) Repeat
the steps ii) to v) till all the nodes are exhausted
This process ensures that the opinion value of the source
node remains 1 throughout and the nodes closer to the source converge faster.
The more the speed, the more is the similarity of the node in consideration
with the source node. The following conclusions can be drawn from the above
process:
a) After
sufficiently large number of iterations, the rank of the nodes according to
their opinion does not change.
b) Also,
the difference between the opinions of two nodes decreases proportionally with
the difference between any other two nodes
These conclusions can be mathematically expressed as follows:
Consider any four nodes a, b, c, d. The opinion of these
nodes at any iteration t be Ota , Otb, Otc, Otd respectively. So,
Where, Ca,b,c,d is a constant depending on a, b, c, d.
So, the opinion is re-scaled every time (at each iteration) , the
lowest value being 0 and the range being 0 to 1. Since, the opinion value
converges to a fixed point after a number of iterations, the final value
obtained after convergence is known as the carryover opinion. The proximity of
nodes with the source node is captured with the help of the re-scaling step.
The following figure is an example of the similarity or proximity calculation, which shows the carryover pinion score on two overlapping graphs with 110 nodes.
Fig. Result of carryover opinion (a proximity score) on two overlapping graphs
Multi
ego-centered community detection:
The idea behind multi ego-centered community detection is that
if there are two nodes n1 and n2, such that n1
belongs to one community and n2 belongs to another community and
there exists one other node n3 which belongs to both of these
communities, then n3 should be similar to both n1 and n2.
The similarity is calculated through the measure described above, the carryover
opinion metric.
The procedure is as follows:
i) Calculate
the similarity of all nodes with node n1 as well as n2.
ii) For
set {n1, n2}, the similarity of all nodes is the
geometric mean of the similarities of all nodes with n1 and with n2.
The procedure can be extended for more number of nodes being
part of the initial set.
Thus, the metric carryover opinion is an efficient and parameter-free
measure used for calculating similarities between two nodes and is also helpful
to detect a unique community by just having a small set of nodes. The
complexity for this metric is O(te), where e is the number of edges and t is relatively
small.
Reference:
[1] Danisch, Maximilien,
Jean-Loup Guillaume, and Bénédicte Le Grand. "Unfolding ego-centered
community structures with “a similarity approach”." Complex Networks IV. Springer Berlin Heidelberg, 2013. 145-153.
No comments:
Post a Comment