Saturday 27 February 2016

Closeness centrality in networks with disconnected components


Closeness centrality in networks with disconnected components

A key node centrality measure in networks is closeness centrality (Freeman, 1978; Opsahl et al., 2010; Wasserman and Faust, 1994). It is defined as the inverse of farness, which in turn, is the sum of distances to all other nodes. As the distance between nodes in disconnected components of a network is infinite, this measure cannot be applied to networks with disconnected components (Opsahl et al., 2010; Wasserman and Faust, 1994). This post highlights a possible work-around, which allows the measure to be applied to these networks and at the same time maintain the original idea behind the measure.
Disconnected componentsThis network gives a concrete example of the closeness measure. The distance between node G and node H is infinite as a direct or indirect path does not exist between them (i.e., they belong to separate components). As long as at least one node is unreachable by the others, the sum of distances to all other nodes is infinite. As a consequence, researchers have limited the closeness measure to the largest component of nodes (i.e., measured intra-component). The distance matrix for the nodes in the sample network is:
NodesAll inclusiveIntra-component
ABCDEFGHIJKFarnessClosenessFarnessCloseness
A112233InfInfInfInfInf0120.08
B112123InfInfInfInfInf0100.10
C111222InfInfInfInfInf090.11
D221211InfInfInfInfInf090.11
E212213InfInfInfInfInf0110.09
F322112InfInfInfInfInf0110.09
G332132InfInfInfInfInf0140.07
HInfInfInfInfInfInfInf12InfInf030.33
IInfInfInfInfInfInfInf11InfInf020.50
JInfInfInfInfInfInfInf21InfInf030.33
KInfInfInfInfInfInfInfInfInfInfInf00Inf
Although the intra-component closeness scores are not infinite for all the nodes in the network, it would be inaccurate to use them as a closeness measure. This is due to the fact that the sum of distances would contain different number of paths (e.g., there are two distance from node H to other nodes in its component, while there are six distances from node G to other nodes in its component). In fact, nodes in smaller components would generally be seen as being closer to others than nodes in larger components. Thus, researchers has focused solely on the largest component. However, this leads to a number of methodological issues, including sample selection.
To develop this measure, I went back to the original equation:
\mbox{closeness}(i) = \sum_j \left[ d_{ij} \right]^{-1} = \frac{1}{\sum_j d_{ij}}
where i is the focal node, j is another node in the network, and d_{ij} is the shortest distance between these two nodes. In this equation, the distances are inversed after they have been summed, and when summing an infinite number, the outcome is infinite. To overcome this issue while staying consistent with the existing measure of closeness, I took advantage of the fact that the limit of a number divided by infinity is zero. Although infinity is not an exact number, the inverse of a very high number is very close to 0. In fact, 0 is returned if you enter 1/Inf in the statistical programme R. By taking advantage of this feature, it is possible to rewrite the closeness equation as the sum of inversed distances to all other nodes instead of the inversed of the sum of distances to all other nodes. The equation would then be:
\mbox{closeness}(i) = \sum_j \frac{1}{d_{ij}}
To exemplify this change, for the example network above, the inversed distances and closeness scores are:
NodesCloseness
ABCDEFGHIJKSumNormalized
A1.001.000.500.500.330.3300003.670.37
B1.001.000.501.000.500.3300004.330.43
C1.001.001.000.500.500.5000004.500.45
D0.500.501.000.501.001.0000004.500.45
E0.501.000.500.501.000.3300003.830.38
F0.330.500.501.001.000.5000003.830.38
G0.330.330.501.000.330.5000003.000.30
H00000001.000.5001.500.15
I00000001.001.00020.20
J00000000.501.0001.500.15
K000000000000

As can be seen from this table, a closeness score is attained for all nodes taking into consideration an equal number of distances for each node irrespective of the size of the nodes’ component. Moreover, nodes belonging to a larger component generally attains a higher score. This is deliberate as these nodes can reach a greater number of others than nodes in smaller components. The normalized scores are bound between 0 and 1. It is 0 if a node is an isolate, and 1 if a node is directly connected all others.
This measure can easily be extended to weighted networks by introducing Dijkstra’s (1959) algorithm.
What to try it with your data?
Below is the code to calculate the closeness measure on the sample network above.
1
2
3
4
5
6
7
8
9
10
11
12
# Load tnet
library(tnet)
# Load network
# Node K is assigned node id 8 instead of 10 as isolates at the end of id sequences are not recorded in edgelists
net <- cbind(
  i=c(1,1,2,2,2,3,3,3,4,4,4,5,5,6,6,7,9,10,10,11),
  j=c(2,3,1,3,5,1,2,4,3,6,7,2,6,4,5,4,10,9,11,10),
  w=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1))

# Calculate measures
closeness_w(net, gconly=FALSE)



1 comment: