Category: Data Journalism in R
How to calculate a rolling average in R
Rolling or moving averages are a way to reduce noise and smooth time series data. During the Covid-19 pandemic, rolling averages have been used by researchers and journalists around the […]
Update: How to geocode a CSV of addresses in R
This post is an update from the previous post, “How to geocode a CSV of addresses in R”. We will be using the ggmap package again, and be sure to investigate the usage […]
How to use hierarchical cluster analysis on time series data
Which cities have experienced similar patterns in violent crime rates over time? That kind of analysis, based on time series data, can be done using hierarchical cluster analysis, a statistical […]
Diagnosing the accuracy of your linear regression in R
In this post we’ll cover the assumptions of a linear regression model. There are a ton of books, blog posts, and lectures covering these topics in greater depth (and we’ll […]
How to explore correlations in R
This post will cover how to measure the relationship between two numeric variables with the corrr package. We will look at how to assess a variable’s distribution using skewness, kurtosis, […]
How to download YouTube data in R using “tuber” and “purrr”
Accessing YouTube’s metadata such as views, likes, dislikes and comments is simple in R thanks to thetuber package by Gaurav Sood. Sood wrote an excellent, easy-to-use package for accessing the […]
How to build a GIF of satellite imagery in R
“The timelapse imagery of Chennai’s disappearing reservoirs is mind boggling,” Earther senior reporter Brian Kahn wrote on Twitter recently. Kahn had just published the piece “Why Chennai, India’s Sixth Biggest […]
Using R to explore the relationship between San Francisco MUNI citations and the weather
Anybody who rides San Francisco’s MUNI regularly has had their Clipper Card checked by a fare inspector. Anecdotally, it also seemed to me that my Clipper Card was being checked […]
Exploring Chicago rideshare data in R
Last month, the City of Chicago released detailed rideshare data from companies like Uber, Lyft and Via, making it the first city in the country to share anonymized data on […]
How to build a website with Blogdown in R
Want to build a website right in RStudio? blogdown is an R package that allows you to create websites from R markdown files using Hugo, an open-source static site generator […]
How to build a bubble chart of individuals mentioned in the Mueller report
When the redacted Mueller report was released last week, journalists across the country scrambled to read through and analyze the document. Data scientists and information designers were hard at work, […]
How to access APIs in R
APIs, or application program interfaces, are a way for people to access data in a plain text format using multiple programming languages. Many websites, organizations and services offer APIs for […]
How to model with gradient boosting machine in R
The tutorial is part 2 of our #tidytuesday post from last week, which explored bike rental data from Washington, D.C. Check it out here. The following tutorial will use a […]
Exploring bike rental behavior using R
Bikes have become one of the fastest growing modes of city travel which is why it’s no surprise that Lyft and Uber are getting into the two-wheeler game. Lyft’s recent […]
Pivoting data from columns to rows (and back!) in the tidyverse
TLDR: This tutorial was prompted by the recent changes to the tidyr package (see the tweet from Hadley Wickham below). Two functions for reshaping columns and rows (gather() and spread()) […]