Category: How to
How To: Use investigative techniques to hold algorithms and artificial intelligence accountable
Editor’s note: This article is from virtual reporting from the livestreamed panel on March 3, 2022. Many tech companies, including Silicon Valley giants, are beginning to grapple with the ethics […]
Analyzing gender differences in music themes and lyrics
Music is a form of self-expression, allowing for a range of human emotions and creativity that can unite people of diverse backgrounds. So it may come as a surprise to […]
Assembling a searchable interface of Boston police misconduct data
The public’s desire for transparency about police misconduct in Boston has never been greater, yet the city’s police department makes finding that information difficult. That’s why I built Boston Cop […]
Finding Embedded Tweets in Online News with Python
Seeing embedded tweets in an online news story has become quite normal. As a platform especially well suited to breaking news, Twitter is a favorite for journalists and politicians alike. […]
Exploring climate data using the Python libraries Matplotlib and Pandas
If you’ve just grabbed a new dataset and you want to look for interesting trends, making some exploratory graphs in Jupyter Notebook is a good way to start. I usually […]
Mapping toxic waste release sites using the Python library GeoPandas
Plotting positional information on maps can reveal geographic trends and makes for an effective visualization to accompany an article. In this tutorial, we’ll use Jupyter Notebook and a Python library […]
How journalists use the EPA’s Toxics Release Inventory to report on environmental justice
In 1984, a Union Carbide pesticide plant in Bhopal, India, leaked 45 tons of toxic methyl isocyanate gas, killing thousands of people and injuring hundreds of thousands. A few months […]
How to calculate a rolling average in R
Rolling or moving averages are a way to reduce noise and smooth time series data. During the Covid-19 pandemic, rolling averages have been used by researchers and journalists around the […]
Update: How to geocode a CSV of addresses in R
This post is an update from the previous post, “How to geocode a CSV of addresses in R”. We will be using the ggmap package again, and be sure to investigate the usage […]
How to shift Alaska and Hawaii below the lower 48 for your interactive choropleth map
Last month, the Harvard Global Health Institute collaborated with The New York Times and ProPublica to publish a model estimating the capacity of hospitals across the United States to cope […]
What I learned applying data science to U.S. redistricting
Deciding whether to reelect President Donald Trump isn’t the only choice Americans will have to make when they head to the voting booths this November. More than 5,000 state lawmakers […]
How to use hierarchical cluster analysis on time series data
Which cities have experienced similar patterns in violent crime rates over time? That kind of analysis, based on time series data, can be done using hierarchical cluster analysis, a statistical […]
Diagnosing the accuracy of your linear regression in R
In this post we’ll cover the assumptions of a linear regression model. There are a ton of books, blog posts, and lectures covering these topics in greater depth (and we’ll […]
How to collect tweets from the Twitter Streaming API using Python
Twitter provides a comprehensive streaming API that developers can use to download data about tweets in real-time, if they can figure out how to use it effectively. In this tutorial, […]
How to explore correlations in R
This post will cover how to measure the relationship between two numeric variables with the corrr package. We will look at how to assess a variable’s distribution using skewness, kurtosis, […]