Data Journalism in R Tutorials

Exploring bike rental behavior using R

Bikes have become one of the fastest growing modes of city travel which is why it’s no surprise that Lyft and Uber are getting into the two-wheeler game. Lyft’s recent acquisition of Motivate, the largest bike rental company in the world, will compete with Uber’s Jump and Ford’s GoBikes, which have delivered 625,000 and 1.4 million rides in San Francisco, respectively.

The growth of bike rentals presents a unique challenge for both the companies offering these services and the cities responding to the scale of change, particularly in forecasting demand.

The following tutorial will use R and data from Capital Bikeshare available at the UCI Machine Learning Repository to explore bike rentals from 2011 to 2012 in Washington, D.C. We were curious what influenced people renting bikes over the course of the year.

Load packages

# packages --------------------------------------------------
library(tidyverse)
library(rsample) # data splitting
library(randomForest) # basic implementation
library(ranger) # a faster implementation of randomForest
library(caret) # an aggregator package for performing many machine learning models
library(ggthemes)
library(scales)
library(wesanderson)
library(styler)

Import the data

The data we have includes measurements on weather conditions (temperature, humidity, wind speed, etc.), how many bikes were rented, and other seasonal attributes that might influence rentals (i.e. weekdays and holidays). Curiously, riders are more likely to rent a bike after Monday despite temperature and the likelihood increases on Saturdays when temperatures are between 15 to 25 degrees Celsius.

This code will import the daily bike rental data in the data folder.

bike <- readr::read_csv("data/day.csv")

Wrangling the data

The following script will wrangling the data and prepare it for modeling. Read through the comments to understand why each step was taken, and how these variables get entered into the visualizations and models.

# WRANGLE ---------------------------------------------------------------
# I like to be overly cautious when it comes to wrangling because all models
# are only as good as the underlying data. This data set came with many
# categorical variables coded numerically, so I am going to create a 
# character version of each variable (_chr) and a factor version (_fct).
# Creating a character and factor variable will let me choose which one to 
# use for each graph and model.
# 
# 
# 

# this recodes the weekday variable into a character variable
# test 
# bike %>%
#   mutate(
#     weekday_chr =
#       case_when(
#         weekday == 0 ~ "Sunday",
#         weekday == 1 ~ "Monday",
#         weekday == 2 ~ "Tuesday",
#         weekday == 3 ~ "Wednesday",
#         weekday == 4 ~ "Thursday",
#         weekday == 5 ~ "Friday",
#         weekday == 6 ~ "Saturday",
#         TRUE ~ "other")) %>% 
#     dplyr::count(weekday, weekday_chr) %>%
#     tidyr::spread(weekday, n)

# assign
bike <- bike %>%
  mutate(
    weekday_chr =
      case_when(
        weekday == 0 ~ "Sunday",
        weekday == 1 ~ "Monday",
        weekday == 2 ~ "Tuesday",
        weekday == 3 ~ "Wednesday",
        weekday == 4 ~ "Thursday",
        weekday == 5 ~ "Friday",
        weekday == 6 ~ "Saturday",
        TRUE ~ "other"))

# verify
# bike %>% 
#   dplyr::count(weekday, weekday_chr) %>% 
#   tidyr::spread(weekday, n)

# Weekdays (factor) ---

# test factor variable
# bike %>%
#   mutate(
#     weekday_fct = factor(x = weekday,
#              levels = c(0,1,2,3,4,5,6),
#              labels = c("Sunday",
#                        "Monday",
#                        "Tuesday",
#                        "Wednesday",
#                        "Thursday",
#                        "Friday",
#                        "Saturday"))) %>%
#   dplyr::count(weekday, weekday_fct) %>%
#   tidyr::spread(weekday, n)

# assign factor variable
bike <- bike %>%
  mutate(
    weekday_fct = factor(x = weekday,
             levels = c(0,1,2,3,4,5,6),
             labels = c("Sunday",
                       "Monday",
                       "Tuesday",
                       "Wednesday",
                       "Thursday",
                       "Friday",
                       "Saturday")))

# verify factor variable
# bike %>% 
#   dplyr::count(weekday, weekday_fct) %>% 
#   tidyr::spread(weekday, n)


# Holidays ----
# test
# bike %>%
#   mutate(holiday_chr =
#       case_when(
#         holiday == 0 ~ "Non-Holiday",
#         holiday == 1 ~ "Holiday")) %>% 
#   dplyr::count(holiday, holiday_chr) %>%
#   tidyr::spread(holiday, n)

# assign
bike <- bike %>%
  mutate(holiday_chr =
      case_when(
        holiday == 0 ~ "Non-Holiday",
        holiday == 1 ~ "Holiday"))

# verify
# bike %>%
#   dplyr::count(holiday, holiday_chr) %>%
#   tidyr::spread(holiday, n)

# test
# bike %>%
#   mutate(
#     holiday_fct = factor(x = holiday,
#              levels = c(0,1),
#              labels = c("Non-Holiday",
#                        "Holiday"))) %>% 
#     dplyr::count(holiday, holiday_fct) %>%
#     tidyr::spread(holiday, n)

# assign
bike <- bike %>%
  mutate(
    holiday_fct = factor(x = holiday,
             levels = c(0,1),
             labels = c("Non-Holiday",
                       "Holiday")))

# # verify
# bike %>%
#   dplyr::count(holiday_chr, holiday_fct) %>%
#   tidyr::spread(holiday_chr, n)

# Working days ----
# test
 # bike %>%
 #  mutate(
 #    workingday_chr =
 #      case_when(
 #        workingday == 0 ~ "Non-Working Day",
 #        workingday == 1 ~ "Working Day",
 #        TRUE ~ "other")) %>% 
 #    dplyr::count(workingday, workingday_chr) %>%
 #    tidyr::spread(workingday, n)

# assign
 bike <- bike %>%
  mutate(
    workingday_chr =
      case_when(
        workingday == 0 ~ "Non-Working Day",
        workingday == 1 ~ "Working Day",
        TRUE ~ "other")) 
 
 # verify
 # bike %>% 
 #    dplyr::count(workingday, workingday_chr) %>%
 #    tidyr::spread(workingday, n)
   
# test
# bike %>%
#   mutate(
#     workingday_fct = factor(x = workingday,
#              levels = c(0,1),
#              labels = c("Non-Working Day",
#                        "Working Day"))) %>%
#   dplyr::count(workingday, workingday_fct) %>%
#   tidyr::spread(workingday, n)

# assign
bike <- bike %>%
  mutate(
    workingday_fct = factor(x = workingday,
             levels = c(0,1),
             labels = c("Non-Working Day",
                       "Working Day")))

# verify
# bike %>%
#   dplyr::count(workingday_chr, workingday_fct) %>%
#   tidyr::spread(workingday_chr, n)


# Seasons
bike <- bike %>%
  mutate(
    season_chr =
      case_when(
        season == 1 ~ "Spring",
        season == 2 ~ "Summer",
        season == 3 ~ "Fall",
        season == 4 ~ "Winter",
        TRUE ~ "other"
      ))

# test
# bike %>%
#   mutate(
#     season_fct = factor(x = season,
#              levels = c(1, 2, 3, 4),
#              labels = c("Spring",
#                        "Summer",
#                        "Fall",
#                        "Winter"))) %>%
#   dplyr::count(season_chr, season_fct) %>%
#   tidyr::spread(season_chr, n)

# assign
bike <- bike %>%
  mutate(
    season_fct = factor(x = season,
             levels = c(1, 2, 3, 4),
             labels = c("Spring",
                       "Summer",
                       "Fall",
                       "Winter"))) 

# verify
# bike %>%
#   dplyr::count(season_chr, season_fct) %>%
#   tidyr::spread(season_chr, n)


# Weather situation ----
# test
# bike %>%
#   mutate(
#     weathersit_chr =
#       case_when(
#         weathersit == 1 ~ "Good",
#         weathersit == 2 ~ "Clouds/Mist",
#         weathersit == 3 ~ "Rain/Snow/Storm",
#         TRUE ~ "other")) %>% 
#   dplyr::count(weathersit, weathersit_chr) %>%
#   tidyr::spread(weathersit, n)

# assign
bike <- bike %>%
  mutate(
    weathersit_chr =
      case_when(
        weathersit == 1 ~ "Good",
        weathersit == 2 ~ "Clouds/Mist",
        weathersit == 3 ~ "Rain/Snow/Storm"))

# verify
# bike %>% 
#   dplyr::count(weathersit, weathersit_chr) %>%
#   tidyr::spread(weathersit, n)

# test
# bike %>%
#   mutate(
#     weathersit_fct = factor(x = weathersit,
#              levels = c(1, 2, 3),
#              labels = c("Good",
#                        "Clouds/Mist",
#                        "Rain/Snow/Storm"))) %>%
#   dplyr::count(weathersit, weathersit_fct) %>%
#   tidyr::spread(weathersit, n)

# assign 
bike <- bike %>%
  mutate(
    weathersit_fct = factor(x = weathersit,
                       levels = c(1, 2, 3),
                       labels = c("Good",
                                 "Clouds/Mist",
                                 "Rain/Snow/Storm")))
# verify
# bike %>%
#   dplyr::count(weathersit_chr, weathersit_fct) %>%
#   tidyr::spread(weathersit_chr, n)


# Months ----
# huge shoutout to Thomas Mock over at RStudio for showing me 
# lubridate::month() (and stopping my case_when() obsession)
# https://twitter.com/thomas_mock/status/1113105497480183818

# test 
# bike %>% 
#   mutate(month_ord = 
#            lubridate::month(mnth, label = TRUE)) %>% 
#   dplyr::count(month_ord, mnth) %>% 
#   tidyr::spread(month_ord, n)

# assign
bike <- bike %>% 
  mutate(month_ord = 
           lubridate::month(mnth, label = TRUE))

# verify
# bike %>% 
#   dplyr::count(month_ord, mnth) %>% 
#   tidyr::spread(month_ord, n)
  

# test
# bike %>%
#   mutate(
#     month_fct = factor(x = mnth,
#              levels = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
#              labels = c("January", "February", "March", "April", "May",
#                         "June", "July", "August", "September", "October",
#                         "November", "December"))) %>%
#   dplyr::count(mnth, month_fct) %>%
#   tidyr::spread(month_fct, n)

# assign
bike <- bike %>%
  mutate(
    month_fct = factor(x = mnth,
             levels = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
             labels = c("January", "February", "March", "April", "May",
                        "June", "July", "August", "September", "October",
                        "November", "December")))

# verify
# bike %>% 
#   dplyr::count(month_chr, month_fct) %>%
#   tidyr::spread(month_fct, n)

# Year ----
# test
# bike %>%
#   mutate(
#     yr_chr =
#       case_when(
#         yr == 0 ~ "2011",
#         yr == 1 ~ "2012",
#         TRUE ~ "other")) %>% 
#     dplyr::count(yr, yr_chr) %>%
#     tidyr::spread(yr, n)

# assign
bike <- bike %>%
  mutate(
    yr_chr =
      case_when(
        yr == 0 ~ "2011",
        yr == 1 ~ "2012"))
# verify
# bike %>%
#     dplyr::count(yr, yr_chr) %>%
#     tidyr::spread(yr, n)

# test
# bike %>%
#   mutate(
#     yr_fct = factor(x = yr,
#              levels = c(0, 1),
#              labels = c("2011",
#                        "2012"))) %>%
#   dplyr::count(yr, yr_fct) %>%
#   tidyr::spread(yr, n)

# assign
bike <- bike %>%
  mutate(
    yr_fct = factor(x = yr,
             levels = c(0, 1),
             labels = c("2011",
                       "2012")))
# verify
# bike %>%
#   dplyr::count(yr_chr, yr_fct) %>%
#   tidyr::spread(yr_chr, n)

# normalize temperatures ----
bike <- bike %>%
  mutate(temp = as.integer(temp * (39 - (-8)) + (-8)))

bike <- bike %>%
  mutate(atemp = atemp * (50 - (16)) + (16))

# ~ windspeed ----
bike <- bike %>%
  mutate(windspeed = as.integer(67 * bike$windspeed))

# ~ humidity ----
bike <- bike %>%
  mutate(hum = as.integer(100 * bike$hum))

# ~ convert to date ----
bike <- bike %>%
  mutate(dteday = as.Date(dteday))

# check df
# bike %>% dplyr::glimpse(78)

# rename the data frame so these don't get confused
BikeData <- bike

# reorganize variables for easier inspection

BikeData <- BikeData %>% 
  dplyr::select(
    dplyr::starts_with("week"),
    dplyr::starts_with("holi"),
    dplyr::starts_with("seas"),
    dplyr::starts_with("work"),
    dplyr::starts_with("month"),
    dplyr::starts_with("yr"),
    dplyr::starts_with("weath"),
    dplyr::everything())

If you then dplyr::glimpse(BikeData) you should have the following:

Observations: 731
Variables: 30
$ weekday        <int> 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2…
$ weekday_chr    <chr> "Saturday", "Sunday", "Monday", "Tuesday", "Wednesda…
$ weekday_fct    <fct> Saturday, Sunday, Monday, Tuesday, Wednesday, Thursd…
$ holiday        <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0…
$ holiday_chr    <chr> "Non-Holiday", "Non-Holiday", "Non-Holiday", "Non-Ho…
$ holiday_fct    <fct> Non-Holiday, Non-Holiday, Non-Holiday, Non-Holiday, …
$ season         <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ season_chr     <chr> "Spring", "Spring", "Spring", "Spring", "Spring", "S…
$ season_fct     <fct> Spring, Spring, Spring, Spring, Spring, Spring, Spri…
$ workingday     <int> 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1…
$ workingday_chr <chr> "Non-Working Day", "Non-Working Day", "Working Day",…
$ workingday_fct <fct> Non-Working Day, Non-Working Day, Working Day, Worki…
$ month_ord      <ord> Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Ja…
$ month_fct      <fct> January, January, January, January, January, January…
$ yr             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ yr_chr         <chr> "2011", "2011", "2011", "2011", "2011", "2011", "201…
$ yr_fct         <fct> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011…
$ weathersit     <int> 2, 2, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2…
$ weathersit_chr <chr> "Clouds/Mist", "Clouds/Mist", "Good", "Good", "Good"…
$ weathersit_fct <fct> Clouds/Mist, Clouds/Mist, Good, Good, Good, Good, Cl…
$ instant        <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1…
$ dteday         <date> 2011-01-01, 2011-01-02, 2011-01-03, 2011-01-04, 201…
$ mnth           <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ temp           <int> 8, 9, 1, 1, 2, 1, 1, 0, -1, 0, 0, 0, 0, 0, 2, 2, 0, …
$ atemp          <dbl> 28.36325, 28.02713, 22.43977, 23.21215, 23.79518, 23…
$ hum            <int> 80, 69, 43, 59, 43, 51, 49, 53, 43, 48, 68, 59, 47, …
$ windspeed      <int> 10, 16, 16, 10, 12, 6, 11, 17, 24, 14, 8, 20, 20, 8,…
$ casual         <int> 331, 131, 120, 108, 82, 88, 148, 68, 54, 41, 43, 25,…
$ registered     <int> 654, 670, 1229, 1454, 1518, 1518, 1362, 891, 768, 12…
$ cnt            <int> 985, 801, 1349, 1562, 1600, 1606, 1510, 959, 822, 13…

Exploratory data analysis

Here are three options for summarizing the bike table into summary statistics that will give us a better understanding of the underlying distribution for each variable in the BikeData data frame. First, we’ll use the dplyr package.

BikeDplyrSummary <- BikeData %>%
  select(temp, atemp, hum, windspeed, casual, registered, cnt) %>%
  summarise_each(list(
    min = ~min,
    q25 = ~quantile(., 0.25),
    median = ~median,
    q75 = ~quantile(., 0.75),
    max = ~max,
    mean = ~mean,
    sd = ~sd
  )) %>%
  gather(stat, val) %>%
  separate(stat, 
           into = c("var", "stat"), 
           sep = "_") %>%
  spread(stat, val) %>%
  select(var, min, q25, median, q75, max, mean, sd)

knitr::kable(BikeDplyrSummary)
varminq25medianq75maxmeansd
atemp18.6883727.4866432.5489236.6924744.5904632.128045.540680
casual2.00000315.50000713.000001096.000003410.00000848.17647686.622488
cnt22.000003152.000004548.000005956.000008714.000004504.348841937.211452
hum0.0000052.0000062.0000073.0000097.0000062.3064314.242776
registered20.000002497.000003662.000004776.500006946.000003656.172371560.256377
temp-5.000007.0000015.0000022.0000032.0000014.796178.546779
windspeed1.000009.0000012.0000015.0000034.0000012.262655.190293
DON’T MISS  What history can tell us about the future of interactive journalism

Next, we’ll use the skimr package (which can be found here).

BikeSkimrSummary <- bike %>%
  skimr::skim_to_wide() %>%
  dplyr::select(type,
    variable,
    missing,
    complete,
    min,
    max,
    mean,
    sd,
    median = p50,
    hist)
knitr::kable(BikeSkimrSummary)
typevariablemissingcompleteminmaxmeansdmedianhist
characterholiday_chr0731711NANANANA
charactermonth_chr073139NANANANA
characterseason_chr073146NANANANA
characterweathersit_chr0731415NANANANA
characterweekday_chr073169NANANANA
characterworkingday_chr07311115NANANANA
characteryr_chr073144NANANANA
Datedteday07312011-01-012012-12-31NANANANA
factorholiday_fct0731NANANANANANA
factormonth_fct0731NANANANANANA
factorseason_fct0731NANANANANANA
factorweathersit_fct0731NANANANANANA
factorweekday_fct0731NANANANANANA
factorworkingday_fct0731NANANANANANA
factoryr_fct0731NANANANANANA
integercasual0731NANA848.18686.62713▇▇▅▂▁▁▁▁
integercnt0731NANA4504.351937.214548▂▅▅▇▇▅▅▂
integerholiday0731NANA0.0290.170▇▁▁▁▁▁▁▁
integerhum0731NANA62.3114.2462▁▁▁▃▇▇▅▂
integerinstant0731NANA366211.17366▇▇▇▇▇▇▇▇
integermnth0731NANA6.523.457▇▅▇▃▅▇▅▇
integerregistered0731NANA3656.171560.263662▁▅▅▆▇▅▃▃
integerseason0731NANA2.51.113▇▁▇▁▁▇▁▇
integertemp0731NANA14.88.5515▁▅▇▇▆▆▇▂
integerweathersit0731NANA1.40.541▇▁▁▅▁▁▁▁
integerweekday0731NANA323▇▇▇▇▁▇▇▇
integerwindspeed0731NANA12.265.1912▂▇▇▆▂▁▁▁
integerworkingday0731NANA0.680.471▃▁▁▁▁▁▁▇
integeryr0731NANA0.50.51▇▁▁▁▁▁▁▇
numericatemp0731NANA32.135.5432.55▁▅▇▇▇▇▆▁
DON’T MISS  How The New York Times is tracking Covid-19 cases in the U.S.

Finally, we’ll use the mosaic package (which can be found here).

BikeMosaicInspect <- mosaic::inspect(BikeData)
# categorical
knitr::kable(BikeMosaicInspect$categorical)
nameclasslevelsnmissingdistribution
weekday_chrcharacter77310Monday (14.4%), Saturday (14.4%) …
weekday_fctfactor77310Sunday (14.4%), Monday (14.4%) …
holiday_chrcharacter27310Non-Holiday (97.1%), Holiday (2.9%)
holiday_fctfactor27310Non-Holiday (97.1%), Holiday (2.9%)
workingday_chrcharacter27310Working Day (68.4%) …
workingday_fctfactor27310Working Day (68.4%) …
season_chrcharacter47310Fall (25.7%), Summer (25.2%) …
season_fctfactor47310Fall (25.7%), Summer (25.2%) …
weathersit_chrcharacter37310Good (63.3%), Clouds/Mist (33.8%) …
weathersit_fctfactor37310Good (63.3%), Clouds/Mist (33.8%) …
month_chrcharacter127310August (8.5%), December (8.5%) …
month_fctfactor127310January (8.5%), March (8.5%) …
yr_chrcharacter273102012 (50.1%), 2011 (49.9%)
yr_fctfactor273102012 (50.1%), 2011 (49.9%)
# date
knitr::kable(BikeMosaicInspect$Date)
nameclassfirstlastmin_diffmax_diffnmissing
dtedayDate2011-01-012012-12-311 days1 days7310
# quantitative
knitr::kable(BikeMosaicInspect$quantitative)
nameclassminQ1medianQ3maxmeansdnmissing
instantinteger1.00000183.50000366.00000548.50000731.00000366.0000000211.16581167310
seasoninteger1.000002.000003.000003.000004.000002.49658001.11080717310
yrinteger0.000000.000001.000001.000001.000000.50068400.50034197310
mnthinteger1.000004.000007.0000010.0000012.000006.51983583.45191287310
holidayinteger0.000000.000000.000000.000001.000000.02872780.16715477310
weekdayinteger0.000001.000003.000005.000006.000002.99726402.00478697310
workingdayinteger0.000000.000001.000001.000001.000000.68399450.46523347310
weathersitinteger1.000001.000001.000002.000003.000001.39534880.54489437310
tempinteger-5.000007.0000015.0000022.0000032.0000014.79616968.54677947310
atempnumeric18.6883727.4866432.5489236.6924744.5904632.12803565.54068017310
huminteger0.0000052.0000062.0000073.0000097.0000062.306429514.24277567310
windspeedinteger1.000009.0000012.0000015.0000034.0000012.26265395.19029267310
casualinteger2.00000315.50000713.000001096.000003410.00000848.1764706686.62248837310
registeredinteger20.000002497.000003662.000004776.500006946.000003656.17236661560.25637707310
cntinteger22.000003152.000004548.000005956.000008714.000004504.34883721937.21145167310

Exploring the impact of weather conditions on bike rentals

Unlike ride-share passengers, bikers are vulnerable to weather conditions, and this might impact their likelihood of choosing a bike over other transportation options. If weather conditions are influential in transportation decisions, we would expect to see relationships between the number of bike rentals and weather features including temperature, humidity and wind speed. Let’s explore.

# ~ rentals by temperature ----
ggRentalsByTemp <- BikeData %>% 
  ggplot(aes(y = cnt, 
                 x = temp, 
                 color = weekday_fct)) +
  geom_point(show.legend = FALSE) +
  geom_smooth(se = FALSE,
              show.legend = FALSE) +
  facet_grid(~weekday_fct) +
  scale_color_brewer(palette = "Dark2") +
  theme_fivethirtyeight() +
  theme(axis.title = element_text()) +
  ylab("Bike Rentals") +
  xlab("Temperature (°C)") +
  ggtitle("Bike Rental Volume By Temperature")
ggRentalsByTemp

The output below is R’s way of telling us how the best-fit line is being drawn through each set of data. This fits a LOESS local polynomial regression to the data.

`geom_smooth()` using method = 'loess' and formula 'y ~ x'

We would also expect windier conditions to negatively impact bike rentals. Let’s analyze the data.

# ggRentalVolByWindSpeed ----
ggRentalVolByWindSpeed <- ggplot(bike) +
  geom_point(aes(y = cnt, 
                 x = windspeed, 
                 color = weekday_fct),
             show.legend = FALSE) +
  facet_grid(~weekday_fct) +
  scale_color_brewer(palette = "Dark2") +
  theme_fivethirtyeight() +
  theme(axis.title = element_text()) +
  ylab("Bike Rentals") +
  xlab("Windspeed") +
  ggtitle("Rental Volume By Windspeed")
ggRentalVolByWindSpeed

Exploring the impact of holidays on bike rental volume

Holidays might influence bike riders in different ways. For instance, we can think of holidays as increasing opportunities for bike riders to enjoy being on the road with fewer drivers, since typically fewer people drive on holidays. We could also consider a situation where bike enthusiasts only prefer to ride their bikes on summer or spring holidays (considering the information we’ve learned about the influences of weather conditions on bike rentals).

ggRentalVolByHoliday <- ggplot(BikeData) +
  geom_density(aes(x = cnt,
                   fill = holiday_chr), 
                   alpha = 0.2) +
  scale_fill_brewer(palette = "Paired") +
  
  theme_fivethirtyeight() +
  theme(axis.title = element_text()) + 
  labs(title = "Bike Rental Density By Holiday",
               fill = "Holiday",
               x = "Average Bike Rentals",
               y = "Density")

ggRentalVolByHoliday

With both Lyft and Uber preparing for IPOs this year and total rideshares for each company hitting one billion and 10 billion rides, respectively, the pace of transportation disruption in cities across the country seems to be increasing. How this evolution will impact greener forms of transportation like bikes will in part be determined by the ability of companies and cities to plan and execute on accommodating demand. Learning to forecast this shift will be an important skill for most everyone involved.

Edit 2019-04-03: The code for this post is available here on Github.

Peter Spangler
Latest posts by Peter Spangler (see all)

2 thoughts on “Exploring bike rental behavior using R

  1. Hi, fantastic work. I have one question though: When I try to run your code I get the following error:
    Error: Column `temp_min` is of unsupported type function.
    I can work around that with unclass() first, but then I get a second error message:

    message: `select()` doesn’t handle lists.
    class: `rlang_error`
    backtrace:
    1. base::lapply(., unclass)
    9. dplyr::select(…)
    Call `rlang::last_trace()` to see the full backtrace
    > rlang::last_trace()
    x
    1. \-`%>%`(…)
    2. +-base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
    3. \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
    4. \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
    5. \-`_fseq`(`_lhs`)
    6. \-magrittr::freduce(value, `_function_list`)
    7. \-function_list[[i]](value)
    8. +-dplyr::select(…)
    9. \-dplyr:::select.list(…)

    Is this a problem with different versions of R?

Leave a Reply

Your email address will not be published. Required fields are marked *

Get the latest from Storybench

Keep up with tutorials, behind-the-scenes interviews and more.