Introduction:
The main objective is to get a better understanding
of what makes people spread information in tweets or microblogs through the use
of retweeting. We find that although users in the majority of cases do not
retweet information on topics that they themselves Tweet about as or from
people who are like them” (hence anti-homophily), we do find that models which
do take homophily into account fits the observed retweet behaviours much better
than other more general models which do not take this into account.
Methodology:
The
streams, people and links in these social media are all treated as a large
homogeneous mass. While such a high-level view of the world is of tremendous
use in order to understand large global behaviours, it unfortunately is not
appropriate for fine-grained analysis of local behaviours. We here focus on
generating a profile of “topics of interest” for a user based on past content
posted, and then use this profile to gain insight into what makes people
propagate information through different behaviour models. Contribution lies in
building and using these user profiles.
This
is done through automatic tagging of people and content into semantically
meaningful categories and then using these categories to develop context-
specific behavioural models for information propagation. Our approach further
relies on being able to match and disambiguate entities mentioned in content so
that we can track what a person writes about over time. For example, rather
than track that a person writes about “Obama” and “Bush” and “Clinton”, we
would like to learn that repeated instances about “Bush” is likely the
president of the United States and that the topic really is Presidents and
politics rather than these keywords. We do this by mapping found entities into
an ontology, as we describe below, and then keeping track of which ontological
concepts show up repeatedly in a user’s content.
These repeated concepts can then be used as that Person’s “topic of interest profile”, which we can use to map against other content, specifically with respect to what that person decides to propagate. Our approach to discovering a Twitter user’s topic profile is based open the idea that the topics of interests can be identified by finding the entities about which a user Tweets, and then determining a common set of high-level categories that covers these entities. As a running example, consider the following real-world Tweet:
These repeated concepts can then be used as that Person’s “topic of interest profile”, which we can use to map against other content, specifically with respect to what that person decides to propagate. Our approach to discovering a Twitter user’s topic profile is based open the idea that the topics of interests can be identified by finding the entities about which a user Tweets, and then determining a common set of high-level categories that covers these entities. As a running example, consider the following real-world Tweet:
There are four entities of interest in this Tweet: Arsenal, which refers to the Arsenal Football Club of England; Walcott, which refers to Theo Walcott, a player for Arsenal; Becks, which refers to football superstar David Beckham; and England. A category that covers these entities within the Tweet might be “English Football.
Therefore,
to develop a topic profile for a user, we analyse all of their Tweets and
determine the set of common high-level categories that covers the set of
Tweets. This set of categories defines the topic profile. In our example, the
profile may include “English Football,” “World Cup,” etc. Our approach is to
look for capitalized, non stopwords as possible named entities. This ensures
high recall (we retrieve many possible entities) while conforming to the
difficulty of our data. If the entity is not found in Wikipedia then we do not
include it in our profile. Wikipedia may return a set of candidates that match
the entity. To deal with disambiguation problem, we leverage the “local
context” of the Tweet. Specifically, we treat the text of the Tweet (excluding
the entity term to disambiguate) as the context for that entity. If we are
using the example Tweet, and our current entity to disambiguate is “Arsenal,” then
the local context is {winger, Walcott, Becks. .}. Again, note that we exclude
stopwords from the context. }. Again, note that we exclude stopwords from the
context. More formally, we define the Tweet’s local context, CT, for an entity, ET as:
The
root categories are more specific than the other categories. To even the counts,
we weight categories by their depth in the tree and then rank each of the
categories c, in the set of sub-trees according to the following ranking
function:
Homophily
Model:
We
want to compute P(retweet(x)),where x is a Tweet previously seen (up to and
including the most recent Tweet).This model is based on profiles of users. It
may be that a user is more likely to retweet another user if they share similar
profiles. By observing what is retweeted, we can generate the underlying
empirical distribution of
Pps (x|simP (x, u)) where simP (x, u) is the
similarity between a user’s profile and that of the profile of the user who
sent the original Tweet. Our Profile-based model is then defined as:
References:
·
Borau, K.; Ullrich, C.; Feng, J.; and
Shen, R. 2009. Microblogging for language learning: Using twitter to train
communicative and cultural competence.
·
Clauset, A.; Newman, M. E. J.; and
Moore, C. 2004. Finding community structure in very large networks. Physical
Review E 70. 066111
No comments:
Post a Comment