Your tweets can tell how much you earn, according to a new study which found that users with higher incomes express more fear and anger on Twitter.

Scientists from University of Pennsylvania, Johns Hopkins University, University College London and Microsoft Research analysed more than 10 million messages on the site and created a statistical natural language processing algorithm that identifies and eventually predicts words used by people from different social strata. Vasileios Lampos, a computer science researcher from University College London and co-author of the study, explains to Metro how online behaviour mirrors people’s real life.

How did you start working on this research?

Publicly available user-generated data can provide very useful insights in several domains, from health to politics. To derive those conclusions we usually apply methods from statistical natural language processing. Using Twitter data, we have been able to propose models able to predict flu prevalence, voting intention and collective emotion, among other things.

What’s the main point of your report?

The main point is that we can use several publicly available characteristics of a social media user, from profiling

attributes to the actual topics or words used in a discussion, to predict their level of income. We can then interpret this relationship further by understanding how these characteristics relate to perceived income levels.

What have you found out?

We observed some fluctuations in the words and topics used by people in distinctively different income bands. This is an interesting proof-of-concept for our study, as it was expected that one of the most important factors would reflect on the behaviour of an online user.

What are the surprising facts your study have shown?

The most surprising was that users with higher income express more fear and anger. However, we need to confirm this more closely in the future.

So, usage of Twitter varies depending on income?

The study was conducted on content from public Twitter accounts. However, we have found a discrepancy of this kind, where users with higher perceived income discussed more about topics related to politics and industry. On the other hand, users with lower perceived income used more frequently swear words in their tweets. 

Do retweets matter?

We have used the number of retweets done and received as a user characteristic in our modelling. We have observed that higher income users get retweeted more and also perform many retweets themselves. This points out that high income users use Twitter more for content dissemination. This is affected also by the larger number of followers higher income users have, which raises the likelihood of a tweet to be retweeted.

How is your study significant?

The study may find obvious commercial applications, like advertising. However, the most interesting use case will be in accommodating various research efforts in the social sciences. Using tools such as the one we propose here, we can begin analysing vast amounts of online users through an important demographic parameter, their income levels.

What’s next in your research?

There is a lot of space for improvement in the method. Of course, further interesting applications will emerge from this analysis, primarily interesting to the domain of computational social science.