Round-ups

Federal data is disappearing. Meet the teams rescuing it

A New York Times analysis found that the new Trump administration took down over 8,000 government web pages in its first two weeks. Among these include 3,000 pages from the Centers for Disease Control and Prevention containing crucial federal healthcare data, 3,000 pages from the Census Bureau and 180 pages from the Department of Justice, which include all state-level data on hate crime. 

With critical federal data disappearing, various organizations across the country are involved in concerted efforts to rescue it.

Muckrock, the non-profit news site and public resource that helps request, analyze and share government records, hosted a webinar on Feb. 13, led by CEO Michael Morisy, to meet with and learn about the teams working to save federal data, shed light on the resources available to the public and share what one can do to contribute to these important efforts. 

“We’re seeing data disappear. We’re seeing documents disappear. Sometimes that is very normal during a transition. Sometimes that is not business as usual. One of the things that we’re working with our community to do is identify data sets that are important,” Morisy said. Using a request form, one can help identify data sets and their statuses. The team then helps by filing public records and Freedom of Information Act (FOIA) requests.

Klaxon is Muckrock’s collaborative tool with the File Coin Network, an open-source cloud storage network, that allows members to monitor web pages. Klaxon sends an email alert upon addition or deletion to the requested pages or data sets and archives a snapshot of the change. 

Muckrock also merged with the Data Liberation Project— a grassroots effort to obtain, analyze, document, and publish data sets of public importance. One can get involved in the project by doing data analysis, documentation, etc. Get in touch with Muckrock at news@muckrock.com

Another organization involved in the efforts is the Internet Archive, a non-profit with a mission to aid in providing universal access to all knowledge. Since its founding in 1996, it has been archiving billions of web pages, accessible through its service, the Wayback Machine, containing over 700 TB of web material. Anyone can submit a URL to be archived at the Wayback Machine by entering it at web.archive.org

DON’T MISS  Welcome to Data Journalism in R

“We have a slogan around here. If you see something, say something,” said Mark Graham, director of Wayback Machine, urging people to partake in the simple act of archiving URLs to help save crucial data. 

Graham also discussed an Internet Archive project called End of Term Archive which engages in a deep dive into U.S. government websites every four years— before and after every presidential election— since 2008. 

Big Local News is a program out of Stanford University aimed at helping local journalists access large and complex data. Sarah Cohen, senior data journalist at Big Local News, shared how they are also creating “story recipes” for reporters to use with the fast changes coming from the new administration.

“For most local reporters, the federal government really hasn’t been a very important part of their jobs. A much bigger part of their job is state and local government coverage. So a lot of folks are not very familiar with the different agencies that are in the federal government, the different data sets and how they can be localized,” Cohen said. 

Reporters can sign up to become a member of Big Local News and access any of the open projects which range from the federal storm events database to data on U.S. foreign assistance and spending.

Harvard’s Library Innovation Lab is a software and design lab, comprised of software engineers, lawyers, librarians and designers, that approaches the current preservation of data— as a longstanding institution— as preserving our cultural record. 

“It’s not just even government data. There are lots of YouTube videos that only exist as YouTube videos. There are lots of news articles that only exist as a news article behind a paywall. And if a policy changes or there’s a cyber attack or there’s an accident, it is so easy for that cultural record to vanish as the ice gets thinner and thinner, and there are fewer and fewer copies of things that we depend on,” said Jack Cushman, the director of the lab. 

DON’T MISS  Visualizing Uncertainty in the Time of COVID-19: The Washington Post’s Chiqui Esteban on the Power of Words

By creating large-scale collections of collections to “establish thicker ice,” their work includes a collection of every data set linked by data.gov. Another project in the works is a collection of every GitHub source code repository that the federal agency has published.

The lab is looking to collaborate with individuals who have the technical skills to write tools to solve classes of problems they’ve identified and would like to work on. 

Linda Kellam, the secretary of the International Association for Social Science Information Service and Technology (IASSIST), spoke about the Data Rescue Project: a coalition of members of data librarianship organizations including IASSIST. 

Abundant with resources with a goal to better coordinate and communicate data rescue efforts among communities while avoiding repetition, the project created a Google document to put together all relevant information about the people and places involved in the work. The Data Rescue Tracker is another collaborative tool built to catalog existing public data rescue efforts to reduce duplication of efforts.

All information on ways you can help the project can be found here

As the situation surrounding federal data continues to develop, the critical work being done by these organizations serves as a lifeline to journalists. The archival of data is a function that any willing individual can contribute to. In Graham’s words, “If you see something, say something.”

Ruchika Erravandla

Leave a Reply

Your email address will not be published. Required fields are marked *

Get the latest from Storybench

Keep up with tutorials, behind-the-scenes interviews and more.