Resolution Limit in Community Detection
Modularity optimization seems to be a very effective method to detect communities, both in real and in artificially
generated networks. However, modularity itself has not yet been thoroughly investigated, and only a few general properties
are known.It is thus a priori impossible to tell whether a module (large
or small), detected through modularity optimization, is indeed a single
module
or a cluster of smaller modules. This raises
doubts about the effectiveness of modularity optimization in community
detection,
and more generally about the applicability of
quality functions.
The modularity of a partition of a network can be written as
where the sum is over the m modules of the partition, ls
is the number of links inside module s, L is the total number of links in the network, and ds
is the total degree of the nodes in module s. The first term of the summand in equation is the fraction of links inside module s;
the second term, in contrast, represents the expected fraction of links
in that module, if links were located at random
in the network (under the only constraint
that the degree sequence coincides with the one of the original graph).
If, for a subgraph S of a network, the first term is much larger than the second, it means that there are many more links inside S than one would expect by random chance. This means that S is, indeed, a module. The comparison with the null model (represented by the randomized network) leads to the quantitative
definition of community embedded . We conclude that, in a modularity-based framework, a subgraph with ls
internal links and total degree ds
is a module if
We can express the number of links ls
out joining nodes of the module s to the rest of the network in terms of ls
, i.e. ls
out = als
with a ≥ 0. Therefore, ds
= 2ls
+ ls
out = (a + 2)ls
and the condition become
from which, rearranging terms, one obtains
If a = 0, the subgraph S is a disconnected part of the network and is a module if ls
< L, which is always true. If a is strictly positive, sets an upper limit to the number of internal links that S must have in order to be a module. This is counterintuitive, because it
means that the definition of community implied by
modularity depends on the size of the
whole network, instead of involving a “local” comparison between the
number of internal
and external links of the module. For a < 2 one has 2ls
> ls
out, which means that the total internal degree of the subgraph is larger than its external degree: ds
in > ds
out. The attributes “internal” and “external” mean that the degree is calculated considering only internal or external links,
respectively. In this case, the subgraph S would be a community.
For a < 2, the right-hand-side of inequality is in the interval [L/4, L]. A subgraph of size ls
such that a < 2 and ls
is less than a quantity in the interval [L/4, L] would then be a community both within the modularity framework . Sufficient conditions for which these constraints are always met are then.
According to this equation, a partition of a network into actual modules (i.e. subgraphs satisfying the condition ) would have a positive modularity, as all summands are positive. On the other hand, it is possible to partition a network such that Q is negative. The network itself, considered as a partition with a single module, has modularity zero: in this case, in fact,
l
1 = L, d
1 = 2L, and the only two terms of the unique module in Q cancel each other. Usually, a value of Q
larger than 0.3–0.4 is a clear indication that the subgraphs of the
corresponding partition are modules. However, the maximal
modularity differs from one network to
another and depends on the number of links of the network.
The fact that quality functions such as modularity can have an intrinsic
resolution limit calls for a new theoretical framework
that focuses on a local definition of
community, rather than on definitions relying on a global null model.
Quality functions
are still helpful, but their role should
probably be limited to the comparison of partitions with the same number
of modules.
FURTHER READING :
http://www.pnas.org/content/104/1/36.full
No comments:
Post a Comment