Monday 29 February 2016

Resolution Limit in Community Detection

Modularity optimization seems to be a very effective method to detect communities, both in real and in artificially generated networks. However, modularity itself has not yet been thoroughly investigated, and only a few general properties are known.It is thus a priori impossible to tell whether a module (large or small), detected through modularity optimization, is indeed a single module or a cluster of smaller modules. This raises doubts about the effectiveness of modularity optimization in community detection, and more generally about the applicability of quality functions. 

The modularity of a partition of a network can be written as  
where the sum is over the m modules of the partition, ls is the number of links inside module s, L is the total number of links in the network, and ds is the total degree of the nodes in module s. The first term of the summand in equation is the fraction of links inside module s; the second term, in contrast, represents the expected fraction of links in that module, if links were located at random in the network (under the only constraint that the degree sequence coincides with the one of the original graph). If, for a subgraph S of a network, the first term is much larger than the second, it means that there are many more links inside S than one would expect by random chance. This means that S is, indeed, a module. The comparison with the null model (represented by the randomized network) leads to the quantitative definition of community embedded . We conclude that, in a modularity-based framework, a subgraph Graphic with ls internal links and total degree ds is a module if 
 
We can express the number of links ls out joining nodes of the module s to the rest of the network in terms of ls , i.e. ls out = als with a ≥ 0. Therefore, ds = 2ls + ls out = (a + 2)ls and the condition become
from which, rearranging terms, one obtains 
If a = 0, the subgraph S is a disconnected part of the network and is a module if ls < L, which is always true. If a is strictly positive, sets an upper limit to the number of internal links that S must have in order to be a module. This is counterintuitive, because it means that the definition of community implied by modularity depends on the size of the whole network, instead of involving a “local” comparison between the number of internal and external links of the module. For a < 2 one has 2ls > ls out, which means that the total internal degree of the subgraph is larger than its external degree: ds in > ds out. The attributes “internal” and “external” mean that the degree is calculated considering only internal or external links, respectively. In this case, the subgraph S would be a community. For a < 2, the right-hand-side of inequality is in the interval [L/4, L]. A subgraph of size ls such that a < 2 and ls is less than a quantity in the interval [L/4, L] would then be a community both within the modularity framework . Sufficient conditions for which these constraints are always met are then.

 According to this equation, a partition of a network into actual modules (i.e. subgraphs satisfying the condition  ) would have a positive modularity, as all summands are positive. On the other hand, it is possible to partition a network such that Q is negative. The network itself, considered as a partition with a single module, has modularity zero: in this case, in fact, l 1 = L, d 1 = 2L, and the only two terms of the unique module in Q cancel each other. Usually, a value of Q larger than 0.3–0.4 is a clear indication that the subgraphs of the corresponding partition are modules. However, the maximal modularity differs from one network to another and depends on the number of links of the network. 

The fact that quality functions such as modularity can have an intrinsic resolution limit calls for a new theoretical framework that focuses on a local definition of community, rather than on definitions relying on a global null model. Quality functions are still helpful, but their role should probably be limited to the comparison of partitions with the same number of modules.

FURTHER READING
 http://www.pnas.org/content/104/1/36.full

No comments:

Post a Comment