Wednesday, 2 March 2016

Centrality measure to find the Importance of Nodes in dynamic Networks

There has been a lot of work done in defining different metrics able to characterize the importance of nodes in networks. Among them, centrality measures have proved to be pertinent as they relate the position of a node in the structure to its ability to diffuse an information efficiently.There is no investigation regarding the fact that the network structure evolves and that node importance may evolve accordingly.
Most centrality measures are based on the study of paths in the network: a node will be important for instance if the paths from it to other nodes are short, or if it lies on shortest paths between many pairs of nodes. One motivation for this is that links can act as a dissemination medium for some phenomena occurring on the network. For instance individuals can exchange information when they communicate, or a message can be forwarded from computer to computer until it reaches its destination.It has been acknowledged for some time that networks are dynamic in nature: nodes and links come and go with time.In the case of centrality, some works have been concerned with efficiently updating the centrality values of the nodes when a change occurs in the network. In many cases however, the time scale at which the network evolves is the same as the one at which a dissemination phenomena may occur on the network. This is the case for instance when a disease propagates among individuals when they are in contact, or when an information is disseminated by email messages.

A small example of a dynamic network
Here the paths change during the network time span, and thus the importance of nodes varies. Consider the toy example of Figure. This small network composed of five nodes evolves during four distinct time steps. One can see that, intuitively, the importance of node b is much stronger at time t = 1 than at time t = 3. Indeed, at time t = 1 it forms a bridge between node a and nodes c, d and e, thanks to the links that exist at time t = 2.
Here we explore a natural extension of the closeness centrality to the case where paths may start at any time during the network’s time span. This temporal closeness characterizes the importance of any node at any given time. We study two datasets and observe that:
  • node importance does vary with time, and therefore capturing the global importance of a node with an aggregate value might be misleading
  • in some cases the dynamic of the network is such that it is meaningless to identify nodes more important than others; in other cases, temporal closeness may help identifying a node consistently important for the whole network time span
A dynamic network G = (V,E) consists of a set V of nodes 1 and a set E of timed links of the form (u, v, t) where u, v and t is a timestamp. Throughout the paper we consider networks as undirected, i.e. a link (u, v, t) is equivalent to a link (v, u, t).A temporal path in a dynamic network consists of:

  • for all i, i = 0..k − 1, ui+1 vi
  • for all i, i = 0..k 1ti ti+1.
  • t0 >= ts
We say that such a path is a path from  u0 to vk starting at time ts. Its duration is equal to tk ts. We will say that a path from u to v starting at time ts is a shortest path if it has the least duration among all paths from u to v starting at time ts. We define the distance from u to v at time t to be the duration of a shortest path from u to v starting at time t, and we denote it by dt(u, v). If there is no path from u to v starting at time t, we consider that dt(u, v) = infinity.For example, in the dynamic network of Figure, there are two temporal paths from e to b starting at time t = 2. The first one consists of the single link (e, b, 2), and its duration is 0; the second one consists of the links (e, d, 3) and (d, b, 4) and its duration is 2. The temporal distance from e to b at time 2 is therefore d2(e, b) = 0. Note that the temporal distance from e to b at time 1 is d1(e, b) = 1: the temporal shortest path has the same link sequence than the one starting at t = 2 (it is the single link (e, b, 2)), but the starting time is different. Since there is no temporal path from c to e starting at time t = 3, d3(c, e) = infinity.


Several variants have been introduced in the literature, most notably concerning the constraint ti < ti+1.Some variants weaken it to ti <= ti+1, while others strengthen it to ti <=ti+1 +(delta), where _ is a parameter representing the time needed to send a message along a link. The relevance of these variants depends on the context.Preliminary work shows that this has little influence on the results.
We recall that the closeness of a node u in a non-evolving network is defined as:
The average of the closeness of all nodes has been defined as network efficiency.
Though some extensions of the closeness have been defined for the case of dynamic networks, their goal is not to take fully into account the fact that temporal distances vary according to the paths’ starting time. We therefore here define the temporal closeness of a node u at time t as:
and we define the temporal efficiency Et(G) of network G as the average over all nodes of the temporal closeness at time t.
Building upon this notion of efficiency, we can study another metric that quantifies the impact a node has on a network. The notion of delta-centrality characterizes how much a given node (or group of nodes) impacts the efficiency. It is defined as the relative change of the network efficiency when the considered node (or group of node) is removed from the network. Following this, we study the extension of the delta centrality to the dynamic case. The temporal delta-centrality of a node v at time t is defined by:

where G\v is the network obtained from G by removing node v and all its adjacent links.

In order to study the behaviors of the metrics introduced above, we study two datasets that present different characteristics and come from two very different contexts:
  • Rollernet: this dataset was collected during a rollerblade tour in Paris in August 2006. The tour is a weekly event and gathers approximately 2500 participants. Among these, 62 were equipped with wireless sensors recording when they are at a communication distance from one another. The dataset therefore contains the proximity links between the persons carrying the sensors. The total dataset duration is approximately 2 hours and 45 minutes.
  • Enron: this dataset contains the 252 759 emails that 151 Enron employees exchanged during three years. It records information on the senders, receivers, and the moment they were sent. Note that by nature, the links are directed but for a fair comparison with the Rollernet dataset, we treated them as undirected in the present study.
A) Temporal efficiency and temporal closeness over time:
Above Figure presents the time evolution of the temporal efficiency for each dataset. We can see that the value fluctuates widely for both of them. Notice that for Enron, even though it fluctuates, the efficiency tends to increase with time 3. This is caused by the fact that, as shown previously, the number of pairs that are reachable from one another increases with time throughout the dataset duration. The final collapse stems from the fact that, towards the end of the dataset, less and less temporal paths exist.We expect that these fluctuations in the efficiency will impact the individual node’s closeness.
Observations can be summarized as follows:
  • Node importance varies with time: a given node may be very important at one time, and not so important at another time; therefore it is not relevant to consider only aggregate values that summarize the importance of nodes on the whole network time span.
  • Different datasets have different properties regarding node importance; for one of our datasets, the importance of all nodes fluctuates extremely rapidly between high and low values; it is meaningless in this case to state that one node is more important than another, except for a very limited time span; for our other dataset however, we find that some nodes are consistently important (or unimportant) for the whole network time span.





No comments:

Post a Comment