Friday 18 March 2016

Online detection of influential users in Twitter

Online detection of influential users in Twitter

Prologue:
An interesting study related to a complex evolving social network is to study Conversation Dynamics. User Influence plays an important role in information/message propagation. It is very interesting to study how a user becomes influential in the context of a real-time event instantaneously. In microblogging sites like Twitter, detecting this set of dynamically emerging influential users (or actors) related to a particular topic helps in recommending them to another user who is interested/following the event. As an example, Swara Bhaskar (who is an actress as well as an ex-JNU-ite) became popular at the time of JNU incident and her tweets were also getting a lot of attentions.  People who were following JNU event might be interested in following her tweets.
There has been a series of research going on in ranking users of twitter. One of the algorithms is IArank or Information amplification ranking, which outperforms conventional PageRank algorithm for ranking of users online. This model tries to capture/ quantify  the “information amplification” potential of a user. The algorithm is a simple but accurate model to rank the influential users real-time. This paper got published in SOCIAL INFORMATICS ’12, IEEE Computer Society.

Feature detection:
In order to develop a ranking system, at first, it is necessary to define a set of measures/features like number of tweets, account creation date, interaction of users,etc. Generally in Twitter, three types of interactions can be observed, namely, reply, retweet and mention. When one user includes another user’s screen name then it is classified as a mention. A user may reply to another user’s tweet where he was mentioned, adding some content to it. The third type of interaction is retweet where a user forwards a tweet received from other users to its own set of followers.

This model ranks the users according to the information amplification ability, which is nothing but a measure of their influence. It defines the metrics based on three important characteristics. One such characteristic is event activity which captures how many times the user has participated in the event. It can be quantified as the number of times the user has retweeted, replied or posted something on that hashtag. Second one is attention received, which can be quantified as a number of mentions for the user. The third characteristic is social connectivity which denotes the popularity in terms of followers and followees.

Based on the above mentioned characteristics, two important features for quantifying the user activity is defined as:
eq1.png

A node having the value of structural advantage greater than 0.5 shows that they have more followers than followees and hence this node can serve as an influence provider.  Similarly a value of Buzz greater than 0.5 shows that the user or his post creates more buzz than overall activity.

Ranking based on Information Amplification:

There are two types of influences, one which gets accumulated across topic and the other, which is the momentary pick of attention.
Cumulative influence: An user gets a cumulative influence, by receiving attention constantly from ordinary users which can be formulated as follows:
equation2.png
where n = total number of edges connecting to the user i and W(user(i)) is the cumulative influence achieved by user(i).

Instantaneous influence :
When an influential user j happens to interact with an ordinary user k, then k can also become influential. For example, j has mentioned k in some tweets. The instantaneous influence is an extension of cumulative model and it incorporates an ageing factor which is responsible for its decay with time.
eq3.png

where Winst(user(k), t)is the influence of the user at time instant t. [The influence is recalculated only when a new tweet is posted on that topic], W(user(j)) is the cumulative influence of the user j and is the damping factor for instantaneous influence.

Now from equations 1 and 2 we get the following:
eq5.png
They have used the values of and as 1 for the experiment.

From equation 3 we get
eq4.png

Dataset:
Tweets based on two events, London Olympics and London Fashion Week were collected to test the effectiveness of the proposed algorithm. The data sets obtained tweets relevant to the hashtags, London
Fashion Week Winter 2012 (#LFW), and the London Olympics 2012 (#London2012).

Comparison with Page rank algorithm:
Page rank algorithm is a well known process and each iteration of the page rank process requires O(n) time and an average of 50 iterations to converge. There are some page rank like algorithms like TwitterRankm, TunkRank applied to per node basis, but they are not as fast when more number of users generate more number of tweets.

Convergence:                       
The authors measured the efficiency of these algorithms real-time and found the relation between minimum time between tweets and page rank algorithm convergence time as a benchmark. 

figure1.png

The above figure shows the relation between the page rank convergence time, and the time between tweets in the specified event. The horizontal axis is in logarithmic scale and the minimum number of users that was taken into consideration is 89. As the number of users increases, the page rank convergence time also increases whereas the inter-tweet time decreases. Hence, page rank iterative procedure is slower than inter-tweet time making it inefficient for online analysis.
               
Evaluation:
The effectiveness of IARank was measured by human judgements. Two polls were taken on the Fashion events on which the experiment was going on.The first poll measured the utility of the ranks to the surveyed users. The second poll asked the participants to create a ranking of Twitter usernames to create a “ground truth” for measuring the quality.

Utility:
In order to compare the results, the participants were asked to classify the Twitter users as relevant or not. The table shows that IARank had a good performance for top 5 results for relevant users.

results.png                   
Quality:   
result2.png

The above figure shows the precision excluding the anomalous user and using IARANK is a bit better than pagerank.

Conclusion:

This algorithm is very applicable for online user rank analysis. While conventional page rank is not so fast with respect to inter-tweet times, this ranking scheme provides accurate measure for user ranking in real time. The performance is also good as compared to page rank algorithm.


References:
Cappelletti, R., Sastry, N.: Iarank: ranking users on twitter in near real-time, based on their information amplification potential. In: Proceedings of the 2012 International Conference on Social Informatics, pp. 70–77. SOCIALINFORMATICS ’12, IEEE Computer Society, Washington, DC, USA (2012)   
       


No comments:

Post a Comment