Newsworthiness in content moderation: Using Media Cloud to look at patterns of attention
The moderation of content on social media platforms such as Facebook, Twitter, and Instagram has increasingly garnered public attention in recent years. It’s not difficult to see why as more of the information we receive comes from these sites, and they continue to influence major political events with large digital footprints such as the 2016 and 2020 elections.
There are now big public policy discussions about regulating social media. Many people believe things need to change, as mis- and dis-information proliferate and extreme views and activity seem to gather momentum through these platforms.
Yet it is worth asking how the public sees these issues, and how exactly they are being represented through the news. There are many questions. What kinds of controversies and issues bubble up to public consciousness through news media? Are they policy-driven or event-driven? Are there differences we can detect between the public’s interests and those of journalists?
As content moderation becomes a more widely covered topic, analyzing how news media represent content moderation helps us understand whether or not the important issues are being covered, and how. Media Cloud is a robust open-source tool for many kinds of data-driven analysis, and Storybench has continued to use it to look at a variety of interesting, and often problematic patterns, relating to media coverage.
The analysis here used Media Cloud to look at coverage of content moderation and created grouping of relevant content moderation stories. The final Media Cloud topic spans from Jan. 1, 2014, to July 27, 2020, and contains just under 4,000 stories, from 403 different sources. This work is part of the Ethics of Content Labeling Project, at Northeastern University’s Ethics Institute.
Analysis of themes across these 4,000 stories, derived by scoring each story against a list of the 600 most common descriptors from the New York Times annotated corpus, shows that “computers and internet” (think: tech angles) is the most common theme, which might be expected. Journalists often cover these issues as tech issues. But the data also suggest that politics and government is the next largest theme, with over 30% of stories discussing it.
Given the turbulence in US politics in recent years, the fact that content moderation news remains so focused in this area speaks to just how much of the conversation around moderation and labeling is taking place in a political context. Notably missing from the top themes is any mention of science or medicine, despite the fact that a large amount of moderation centers around removing medical misinformation, such as anti-vaccination or 5G conspiracies.
Even when the data is sliced and examined in monthly chunks, no medical themes emerge from February to July of 2020, where we may expect them to due to the COVID-19 pandemic. This lack of coverage is despite the fact that platforms such as Facebook introduced measures to specifically combat COVID-related misinformation.
This focus on political news coverage can also be seen in the Top People statistics provided by Media Cloud, where political figures constitute five of the top 10 people mentioned, and three of the top five. Donald Trump, unsurprisingly the person with the most mentions, appears in 28% of the collected articles. Other political figures mentioned often include Barack Obama, Hillary Clinton, and Nancy Pelosi, as well as conspiracy theorist Alex Jones.
Diverging priorities between the media and the public
There is a notable divergence between what news media covers and what the public may want to know about. The most prominent example of this in the data is the comparison of top stories as categorized by the number of media inlinks, which represent how many times the article has been linked to from other stories on the internet, as compared to Facebook shares.
In the first table below are the stories with the most media inlinks, a data point that can provide suggestive evidence of what the news media is interested in. The second table shows the stories which individual users have shared on Facebook the most, giving an idea of the kind of story the general public is interested in.
Most Media Inlinks:
|The Secret Lives of Facebook Moderators in America||Verge|
|Facebook’s Secret Censorship Rules Protect White Men from Hate Speech But Not Black Children||ProPublica|
|Revealed: Facebook’s Internal Rulebook on Sex, Terrorism, and Violence||The Guardian|
|YouTube Creator Blog: Protecting Our Extended Workforce and the Community||YouTube|
Most Facebook Shares:
|Facebook Bans Deepfakes in Fight Against Online Manipulation||Washington Post|
|Facebook Bans White Nationalism and White Separatism||Vice|
|Facebook Says it is Deleting Accounts at the Direction of the U.S. and Israeli Government||Intercept|
|Facebook Removes 4 Pages from InfoWars and Alex Jones||CNN|
Side by side, the most striking thing is that there isn’t a single commonality between these lists. In fact, in the top 100 of each ranking, there are only 11 articles that appear on both. The two lists are very different, suggesting a divergence in focus between individuals and the media. With this in mind, we can examine each list in turn.
The articles with the most inlinks were more focused on overarching issues, such as broad policy change and implementation. Another factor shared by several top inlinked articles is their use of ‘secret’ or ‘revealed,’ showing that the media tends to favor stories that paint these policies and issues in an enigmatic light. Lack of transparency in big technology has been an ongoing issue, and the media may be seeking to capitalize on this for shock value, in an attempt to draw readers in. However, the public has clearly chosen for themselves what is of interest.
As we see in the Facebook shares list, the articles with the most interest are entirely about specific incidences of moderation, most often concerning full removal of content or a ban of some kind. These stories contain inherent shock value, as removing or banning content constitutes the most extreme form of moderation, beyond any labeling, restrictions, or deprioritization.
This focus, whether it is about a certain type of content like deepfakes or a specific entity like InfoWars and Alex Jones, is entirely on individual site decisions, opposing the priority of site policies as a whole that we saw with the list sorted by media inlinks. The public interest currently centralizes on the impact of moderation policies, even though news media covers the creation of policies and addresses the potential effects before the outcry of the implementation occurs. Perhaps if there were even more transparency from social media sites, aided by the media toning down the ‘secrecy’ in their headlines, there would be less outcry when standard policy enforcement occurs.
Coverage by social media platform
A final aspect of the data that is of interest is the amount that each social media site itself is mentioned, compared to its user base. Throughout the data, as can be seen in the headlines in the charts above, Facebook is mentioned significantly more than any other site, but when we see the related user statistics, it’s not difficult to see why.
Facebook is mentioned 71 times for every 1,000 articles and has 2.5 billion users. Twitter is mentioned less with only 21 for every 1,000 articles, but has 321 million users. Even though Facebook is mentioned 3.38 times more than Twitter, this is rational since their user base is 7.69 times larger.
A different comparison can be found between Facebook and YouTube, which was mentioned in 17 of every 1,000 articles and has 2 billion users. Facebook’s user population is only 1.25 times larger than Youtube, but is mentioned 4.18 times as much. This is a distinct imbalance in coverage, especially since YouTube moderates in similar ways, labeling content, providing context on funding for pages, and removing videos that violate their policies. Both sites harbor news, including large amounts of political content, and allow advertisements, and yet the moderation decisions Facebook makes are much more discussed than any event on YouTube.
The difference may lie in how users interact with the respective platforms. Facebook tends to be a much more active experience, with user priority in social connection, while YouTube is generally utilized in a more passive sense as an entertainment source. Another possibility is that YouTube has been moderating in much the same way for quite some time, while Facebook has had many recent changes in the way content is treated, and so the media and the public both are more interested in the more novel events occurring with the latter platform.
Content moderation centering around politics is to be expected, but it does raise issues about what kinds of moderation are being left out of coverage, such as scientific misinformation in the COVID-19 pandemic. News media seem to be focused on platform policies and changes as a broader subject, but individual consumers are centered on instances of banning or removal.
This analysis is limited by the scope of the data, namely that it only includes articles from the top 500 US news sources, with no international perspective. Additionally, the time period of the data could be expanded. A subsequent investigation might benefit from broadening the scope of time and geography, as well as engaging in sentiment analysis of the articles for a more in-depth evaluation. Nevertheless, the findings in this analysis raise significant questions about whether the public and the press are focusing on the right issues, and how their interests align.