Wednesday, 2 March 2016

Marvel Universe looks almost like a real social network


Collaboration networks have always been an active area of research in social networks. You must have heard about the Movie Actors network, scientific collaboration networks, etc. but have you ever heard of the Marvel Universe collaboration network? It is an artificial network of Marvel Comics characters but mimics a very real-life networks behavior. In this network, the nodes correspond to Marvel Comics characters, and there exists an edge between two characters if they have jointly appeared in the same Marvel comic book. The joint appearance of the pairs of characters in the same comic book allows us to form a collaboration network for the Marvel Universe network.


The Marvel Universe network
The Marvel Universe network (MU) has nodes that are popular Marvel characters and two characters are linked when they jointly appear in a significant way in the same comic book. The data set used was the Marvel Chronology Project (MCP) that collects over 96,000 appearances by more than 6,000 characters in about 13,000 comic books.


Analysis of the network
Initially, a bipartite graph(X, Y, E) where nodes X correspond to Marvel characters and nodes Y correspond to the comic books, and edges from every character to all the books where it has appeared, is generated. A collaboration network, called as the MU network,  is then derived from this bipartite graph as its projection on its set of characters.


Basic data on appearances of characters in comic books
Number of characters: 6,486
Number of books: 12,942
Mean books per character: 14.9
Mean characters per book: 7.47


The bipartite graph
Let Pb(k) represents the probability that a comic book has k characters appearing in it, and Pc(k) represents the probability that a character appears in k comic books. 
It was found that Pb(k) follows the power-law tail represented in Fig. 1



Fig.1. Probability Distribution Pb(k)


The distribution of Pc(k) obtained here is normally not find in bipartite graphs associated with collaboration networks,




Fig.2. Probability Distribution Pc(k)
The exponent value is much smaller than other values for similar collaboration networks, that usually ranges from 2 to 3.

The null random model
The study starts with a random bipartite graph, called MU-BR graph hereafter, with 6,486 nodes-characters and 12,942 nodes-books, and whose edges have been created using the same distributions Pc(k) and Pb(k) of outgoing and ingoing edges found above. Then, a projection of this random graph called as the MU-R graph was generated where the nodes correspond to characters and its links represent to be connected to the same book in a MU-BR graph.

Summary of results of the analysis of the MU network.
Mean partners per character:51.88
Size of giant component:6,449 (99.42%)
Mean distance:2.63
Maximum distance: 5
Clustering coefficient:  0.012

Distribution of partners:


Separation
Here the research tries to relate the MU network to the mathematicians’ collaboration network and see if there exists someone like Erdos. The diameter found in the MU network is 5 implying there is always a chain of at most 4 collaborators connecting any two connectable characters in the Marvel Universe. Finally, it tries to find out a Marvel character who is like Erdos sitting at the center of the the giant component, one that minimizes the total distance from it to all other nodes in the network and it turns out the character is the Captain America, who is, on average, at distance 1.70 to every other character in the network.

Clustering coefficient
The popular collaboration networks having studied so far have large clustering coefficients in general for most social networks. The low value of the average distance between connected nodes and this large clustering coefficient taken together is defined as the small-world networks. Against what happens with real-life social networks, it turns out that the clustering coefficient of MU is small. Its value is CMarvel = 0.012, while the clustering coefficient of a random network with 6,486 nodes and 1,68,267 links is Crandom = 0.008 which implies CMarvel is roughly 1.5 × Crandom, and not several orders of magnitude larger which separates it from the real-life, collaboration networks.

Degree Distribution P(k)
The degree distribution is yet another interesting statistical feature that can help differentiate random networks from non-random networks. A random network with n nodes and m links follows the binomial distribution

 
where p is given as,

On the other hand, the collaboration networks in general follow the distribution which has a heavy tail represented by either a power law as


for some positive constant τ,  or a power law form with an exponential cutoff as


where τ and c are positive constants, and c is large.

The data for the MU network fits a power law distribution with an exponential cutoff and the distribution was represented as


Thus, the degree distribution of the MU network has a power law tail with cutoff as can be seen in Fig.3. A smaller value of  τ say less than 2 represents the domination of the MU network properties by the few actors with a large number of collaborators which is quite similar to many real-life networks. On the other hand, the value of τ much smaller than 2 shows that the weight of characters like Captain America, Spider-Man, and other major -heroes is very large that what happens in real life networks like scientific or any other collaborator networks and this is expected as there are no super-heroes in real life.

 Fig.3. Probability distribution P(k)

Conclusions
The article thus realizes that though to some extent the MU tries to mimic real-life networks, and it is largely different from a random network, but it cannot hide its artificial origins completely.

For more details have a look at http://arxiv.org/pdf/cond-mat/0202174v1.pdf

1 comment: