Transforming journalism with data and AI: Research on representation, equity and accountability in news from the 2024 Computation + Journalism Symposium
At the Computation + Journalism Symposium hosted by Northeastern University Oct. 25-27, journalists and data scientists from around the globe came together to discuss the ways that technology to be used to tell compelling stories. A panel of data-driven research projects centered around the theme of “Audience and Representation” looked at new work on everything from how transgender people are represented to tracing missing persons in Detroit.
⓵
The way journalists report on transgender people and issues sets the stage for public discourse, and there are problems with current coverage, according to Alyssa Smith, a fourth-year PhD candidate at the Network Science Institute at Northeastern University.
Smith’s research is titled “An Annotated Dataset of U.S. Transgender News For Determining Agenda-Setting & Information Flows.” Smith and her collaborators collected and analyzed a corpus of news articles about transgender people and trans issues from U.S. local and national media.
“Coverage is often exploitative, sometimes misgenders folks and generally uses a lot of harmful tropes,” Smith said.
Drawing from examples of headlines “Colorado parents sue over state law and school policy regarding transgender students” and “Colorado mom sues school that recruited sixth-graders for secret after-school gender and sexuality club,” she illustrated that current coverage of transgender students in the U.S. robs students of their agency.
These headlines often focus on parents’ concerns over “indoctrination,” rather than the perspectives of their queer children, Smith said. “Tying LGBTQ+ identities to indoctrination feeds into harmful tropes in the news and in mainstream discourse about LGBTQ+ people,” Smith said.
Smith and her team are creating a human-annotated trans news dataset to extract tropes, understand their framings, and examine how coverage gets dispersed across the country.
“We found that information flows from national outlets to local ones. But there’s a catch. It’s not a direct one step flow. Instead, we often see one or two local state level collections mediating the information flow on a particular topic,” she said. By local state level collections, Smith is referring to smaller news outlets.
She and her team used Latent Dirichlet Allocation (LDA), which is a statistical model that categorizes by topic in a collection of text, to spot broad topics. To catch subtle biases and tropes, they also used a framework called DELUGE. Deluge stands for “Data Enriched Language for the Universal Grid Environment” and is used to streamline and automate processes with large scale data.
DELUGE uses “probabilistic inference techniques to produce human generated labels,” she said.
The framework allows room for minimal human error, while still maintaining probabilistic accuracy. “We are able to use human annotators to pull out more subtle tropes and biases,” that are not necessarily possible with other topic models, Smith said.
Smith and her team are expected to finish their dataset in November, and plan to release it publicly on the Social Media Archive (SOMAR) for journalism research in December.
⓶
Ever heard of a For You Page, or FYP? How can news organizations responsibly design an FYP for news? Eliza Mitova, doctoral student at the University of Zurich, found discrepancies between how news orgs and their readers think about news personalization.
Mitova’s presentation, titled “A complex interplay: Exploring the (mis)match between audiences’ and news organizations’ assessments of news personalization,” offered an analysis of misalignments between news organizations and readers on the implementation of news personalization and recommendations for responsible design of news recommender systems (NRS).
Personalized media feeds, such as a “For You” stream on social media platforms or YouTube, are more common than ever. Mitova’s research poses the question: “Does news personalization have the potential to be incorporated into news outlets?”
“The journalistic supply side and audience demand side are linked and interwoven,” she said, explaining that there are no simple answers. If reporters are only focusing on what audiences want to see, news outlets will only supply what is demanded. This means that readers will not be exposed to challenging information as much.
“Audiences want something else from journalism than from social media,” Mitova said. She and her team looked into how news professionals and audiences’ attitudes misalign or agree.
Using the 4A matrix that accounts for actors, actants, activities and audiences, they deduced that “organizations as a whole are very cautious about adopting personalization,” because news outlets have the responsibility to put out all news, not just news their audience wants to see. Actants are the technology, such as the news recommenders and external algorithmic platforms.
That does not mean that the future of news personalization is dead. Although audience attitudes toward personalization is generally positive, it varies across topics.
“People do not necessarily want news personalization when it comes to the home page, politics, breaking news, economics,” said Mitova.
Trust in media declines when their audience believes their news outlets employ personalization. People see both the benefits and downsides of personalization, which is why audiences want transparency with how these systems are adopted.
“News organizations should continue investing in responsible personalization and AI use,” Mitova said. “These systems have implications for the maintenance of a diversity informed democratic society.”
⓷
More than one person is reported missing every day in Detroit. How can journalists use automation effectively to cover more people more equitably?
Kristi Tanner, Anjanette Delgado and Stephen Harding of the Detroit Free Press presented their research “Missing in Detroit: Creating a Framework for Equitable Reporting and Minimizing Harm.”
Newsroom coverage – and lack thereof – has an immense power to change community attitudes.
Detroit Free Press was faced with a problem: as of Oct. 18, Detroit had 2,637 reported missing persons. There was simply too much data to comb through. They sought to expand their capabilities on how they report on missing persons by tapping into an automation process.
“The goal was to extract data from PDF attachments attached to emails [the police department] would send us. We decided to use Microsoft Power Automate,” Harding said.
Tanner, Delgado and Harding used Microsoft Power Automate to trigger specific keywords in the emails and attachments, but they got mixed results. Microsoft Power Automate is a cloud-based service that allows users to synchronize and work with data among different platforms.
They decided to train their own AI model using a pre-built AI model within Power Automate. According to Harding, training the model involved telling the software to which parts of the data to look out for. By defining certain fields and tagging important bits of information, it “worked out pretty well in most cases,” he said.
Over time, the process evolved. They added logic to handle typos that would throw off the system, skipped over non-missing persons and updated the database when missing people were found.
Now, they are working with a partner to build a generative AI model – a more efficient way to extract data using context.
With the use of these generative AI models that can churn output faster than ever, Tanner emphasizes the importance of pausing in the midst of the publish or perish culture of news.
“Take a step back, evaluate, think about impact. Talk not just to academics, experts, talk to your community. Go to your neighborhood leaders, walk around and use those resources and be willing to change course,” she said.
⓸
How did news organizations portray law enforcement in their coverage of lynching? Mohamed Salama,a doctoral student at Merrill College of Journalism at the University of Maryland, presented his research “Bias or Not? Exploring US Press Representations of Law Enforcement in Lynching Coverage, 1789–1963.”
Salama used the NAACP definition of lynching, which is “the public killing of an individual who has not received any due process.” In his research, he found that U.S. press played an indirect negative role in lynching by sensationalizing and normalizing lynching under the pretext of “protecting the community.”
“I found many news pieces encouraging people to participate in lynching to watch,” Salama said.
He extracted news articles from the Library of Congress. By refining his scope to lynching and law enforcement, he was able to identify a subset of 1,767 articles out of 60,000 articles.
After the lengthy process of digitizing historical newspapers and cleaning his textual data, Salama investigated the frequency and strength of connections between nouns and adjectives – how frequently the noun “policeman” was associated with the adjective “brave,” for example.
“I took this sample and tried to annotate the relations myself and against my model and I was very likely to have a compelling evaluation,” he said.
To quantify his data and measure news outlet attitudes toward law enforcement, he conducted a sentiment analysis.
Salama found that the majority of news coverage of law enforcement at the time was neutral. A smaller proportion of coverage was positive, and an even smaller portion of coverage was negative toward law enforcement. His conclusion? By being neutral, the press was complicit in lynchings.
The use of AI in newsrooms can be a scary thought, but these presentations are just a few examples of how journalists are using technology to improve representation in their coverage. For more information on the presentations in the conference, feel free to reference the archive of papers here.
Editor’s note: a previous version of this article misattributed a quote from Smith to Mitova.
- Transforming journalism with data and AI: Research on representation, equity and accountability in news from the 2024 Computation + Journalism Symposium - November 26, 2024
- “Not just broadcast:” WBUR CEO Margaret Low on the future of the business of news - November 19, 2024
- How Vox uses animation to make complicated topics digestible for everyone - April 18, 2024