Data Journalism in R

  • How to calculate a rolling average in R
    Rolling or moving averages are a way to reduce noise and smooth time series data. During the Covid-19 pandemic, rolling averages have been used by researchers and journalists around the world to understand and visualize cases and deaths. This post will cover how to compute and visualize rolling averages for the new confirmed cases and […]
  • Update: How to geocode a CSV of addresses in R
    This post is an update from the previous post, “How to geocode a CSV of addresses in R”. We will be using the ggmap package again, and be sure to investigate the usage and billing policy for Google’s Geocoding API. The API now has a pay-as-you-go model, and every account gets a certain number of requests free per month, For […]
  • How to use hierarchical cluster analysis on time series data
    Which cities have experienced similar patterns in violent crime rates over time? That kind of analysis, based on time series data, can be done using hierarchical cluster analysis, a statistical technique that, roughly speaking, builds clusters based on the distance between each pair of observations. Basically, in agglomerative hierarchical clustering, you start out with every […]
  • Diagnosing the accuracy of your linear regression in R
    In this post we’ll cover the assumptions of a linear regression model. There are a ton of books, blog posts, and lectures covering these topics in greater depth (and we’ll link to those in the notes at the bottom), but we wanted to distill some of this information into a single post you can bookmark […]
  • How to explore correlations in R
    This post will cover how to measure the relationship between two numeric variables with the corrr package. We will look at how to assess a variable’s distribution using skewness, kurtosis, and normality. Then we’ll examine the relationship between two variables by looking at the covariance and the correlation coefficient. Load the packages The packages we’ll […]