Thursday, November 3, 2011

Why is Tweet Topic Analysis challenging?

   Analyzing the topics for millions of tweets can be a challenging task. Tweet Dynamics software is able to determine tweet topics with 80% accuracy as of now. However, there are still many challenges remaining to be addressed in order to improve the process of tweet categorization. Some of these challenges have been outlined below:
  1. Context sensitive categorization of tweets: Tweet Dynamics software currently categorizes noun phrases in the tweets and determines tweet topics based on the topics of the noun phrases. This can result in incorrect categorization when the topic of the noun phrase is not consistent with the context in which it is mentioned in the tweet. For example the tweet - I saw a shooting star in the sky yesterday - might be categorized into the topic music album since shooting star is the name of a music album. However this category is not correct since in the context of the tweet shooting star is a meteor and not a music album.
  2. Filtering out interesting tweets: Many tweets that users post are status updates and are not interesting to other users. For example the tweet - Uptown2967 I'm thinking nap over lunch today - is a status update which might be of interest to the users friends but is not of interest to the majority of users. Tweet Dynamics software will currently categorize this tweet into the food category. One solution to this problem would be to categorize tweets from only the most influential twitter users. This will help the categorization software in focusing on the most interesting tweets.

No comments:

Post a Comment