How machine learning could change journalism
A fact-checking tool built into a content management system, text “filters” that could detect spam messages, a search engine that could give results according to a user’s level of education. Walter Frick, a senior associate editor at Harvard Business Review, thinks these are all scenarios in which machine learning systems in the not-so-distant future could help dramatically shape the way journalists work.
Frick discussed machine learning at a recent presentation at the Nieman Foundation for Journalism in Cambridge, Mass. As a 2016 Knight Visiting Nieman Fellow, Frick studied how machine learning – a process by which computer algorithms are taught to recognize patterns – might help news organizations better organize background information and archive material in order to provide context for journalists and readers.
During the lecture, he stressed why he believes machine learning systems will be essential for journalists, especially those who focus on explanatory journalism. He also discussed how machine learning systems will look in the near future.
What can machine learning do for journalism?
Machine learning is a type of artificial intelligence that provides computers with the ability to learn without being explicitly programmed. It explores the study and construction of algorithms that can learn from and make predictions on data. To Frick, machine learning at its simplest level is “a program trying to automatically recognize patterns and data,” where someone can give all the variables to the program and let it find the correlations.
“Everything you need to know about, I learned from Googling for an hour,” Frick said, quoting from a tweet poking fun at the idea that people have already been using the machines to get information. For reporters, that means learning the basics about a new subject through a simple Google search.
Frick believes that, with advances in machine learning technologies, it’s possible to have an algorithm look at a dataset and find out how it compares to previous years, including expectations. The advantages may be obvious. For example, business reporters could use this technology to cover a much larger number of reports than they ever could, he said, something the Associated Press is currently trying out.
Machine learning systems should be looked at as instruments to make journalists more efficient and effective instead of as tools that will eliminate the need for experts. “It’s probably not a threat to replace the way we write new articles and do explanatory journalism,” he said.
Frick thinks there are two fields in this area, what he called, “automated explainers” and “automated research assistants.”
According to Frick, NewsBlaster, NewsInEssence, Lycos, and IBM Watson are all “automated explainers.” The system will first identify articles, then identify important information within articles (a process also called auto-summarization) to infer structure, and then present the summary to the user.
Tools like Google Docs are “automated research assistants” that focus on the research process and are similar to search engines. For example, the New York Times released “Editor” last year, an internal tool that the paper uses to tag content. If you are writing in “Editor,” the system will use machine learning to try to infer what you are writing and suggest content-appropriate tags.
Why should journalists choose machine learning systems over Google?
Frick thinks that there are three reasons why a journalist should choose machine learning systems over search engines.
First, it’s still time-consuming to do research based on search engines, which are imperfect and do not always respond the way one hopes, especially for those journalists who do explanatory work and are trying to find academic sources.
Second, studies show that summarized results lead to better research. A person looking at summarized results can scan more documents from better sources.
Third, he says Google is moving away from a world where you search for something and you are directed to ten different links. They are trying to provide the obvious answers first and most prominently.
Frick also points out that machine learning system cannot replace search engines because it does not suit everybody, and they are not aimed at providing compelling stories people want to consume. But if you are a journalist on deadline or a reporter who has ten minutes before interviewing an expert, you will understand that these tools have potential.
The future of machine learning systems
Journalist’s Resource, a project of Harvard’s Shorenstein Center, is a great resource that helps journalists find academic research on all topics, Frick points out. He hopes to build an automated version of the website that will provide suggestions based on the topic that the reporter is looking at.
Frick thinks in about five years, machine learning systems may be advanced in the following ways:
- Better machine translation to help journalists do research across languages.
- Explainers that adjust automatically to a user’s level of background knowledge.
- Text “filters” that dial snark up or down.
- Fact-checking built into a content management system.
- Stories that emphasize tech implications for technologists, business implications for the business person.
How to get started in machine learning technology?
Machine learning has become far more accessible to non-experts in recent years. “You no longer need a PhD to do machine learning,” Frick says. You don’t even need an extensive background in computer science to get started.
Frick suggests that if you are looking for a good introduction, follow the work of Will Knight at MIT Technology Review. Books like What Every Manager Should Know About Machine Learning, An Introduction to Statistical Learning and Natural Language Processing with Python are also great resources.
Ed note: For a look at how journalists are using machine learning to sift through news video, read this Storybench article.