Exploring bike rental behavior using R

April 2, 2019April 10, 2019 by Peter Spangler

Bikes have become one of the fastest growing modes of city travel which is why it’s no surprise that Lyft and Uber are getting into the two-wheeler game. Lyft’s recent acquisition of Motivate, the largest bike rental company in the world, will compete with Uber’s Jump and Ford’s GoBikes, which have delivered 625,000 and 1.4 million rides in San Francisco, respectively.

The growth of bike rentals presents a unique challenge for both the companies offering these services and the cities responding to the scale of change, particularly in forecasting demand.

The following tutorial will use R and data from Capital Bikeshare available at the UCI Machine Learning Repository to explore bike rentals from 2011 to 2012 in Washington, D.C. We were curious what influenced people renting bikes over the course of the year.

Load packages

# packages --------------------------------------------------
library(tidyverse)
library(rsample) # data splitting
library(randomForest) # basic implementation
library(ranger) # a faster implementation of randomForest
library(caret) # an aggregator package for performing many machine learning models
library(ggthemes)
library(scales)
library(wesanderson)
library(styler)

Import the data

The data we have includes measurements on weather conditions (temperature, humidity, wind speed, etc.), how many bikes were rented, and other seasonal attributes that might influence rentals (i.e. weekdays and holidays). Curiously, riders are more likely to rent a bike after Monday despite temperature and the likelihood increases on Saturdays when temperatures are between 15 to 25 degrees Celsius.

This code will import the daily bike rental data in the data folder.

bike <- readr::read_csv("data/day.csv")

Wrangling the data

The following script will wrangling the data and prepare it for modeling. Read through the comments to understand why each step was taken, and how these variables get entered into the visualizations and models.

# WRANGLE ---------------------------------------------------------------
# I like to be overly cautious when it comes to wrangling because all models
# are only as good as the underlying data. This data set came with many
# categorical variables coded numerically, so I am going to create a 
# character version of each variable (_chr) and a factor version (_fct).
# Creating a character and factor variable will let me choose which one to 
# use for each graph and model.
# 
# 
# 

# this recodes the weekday variable into a character variable
# test 
# bike %>%
#   mutate(
#     weekday_chr =
#       case_when(
#         weekday == 0 ~ "Sunday",
#         weekday == 1 ~ "Monday",
#         weekday == 2 ~ "Tuesday",
#         weekday == 3 ~ "Wednesday",
#         weekday == 4 ~ "Thursday",
#         weekday == 5 ~ "Friday",
#         weekday == 6 ~ "Saturday",
#         TRUE ~ "other")) %>% 
#     dplyr::count(weekday, weekday_chr) %>%
#     tidyr::spread(weekday, n)

# assign
bike <- bike %>%
  mutate(
    weekday_chr =
      case_when(
        weekday == 0 ~ "Sunday",
        weekday == 1 ~ "Monday",
        weekday == 2 ~ "Tuesday",
        weekday == 3 ~ "Wednesday",
        weekday == 4 ~ "Thursday",
        weekday == 5 ~ "Friday",
        weekday == 6 ~ "Saturday",
        TRUE ~ "other"))

# verify
# bike %>% 
#   dplyr::count(weekday, weekday_chr) %>% 
#   tidyr::spread(weekday, n)

# Weekdays (factor) ---

# test factor variable
# bike %>%
#   mutate(
#     weekday_fct = factor(x = weekday,
#              levels = c(0,1,2,3,4,5,6),
#              labels = c("Sunday",
#                        "Monday",
#                        "Tuesday",
#                        "Wednesday",
#                        "Thursday",
#                        "Friday",
#                        "Saturday"))) %>%
#   dplyr::count(weekday, weekday_fct) %>%
#   tidyr::spread(weekday, n)

# assign factor variable
bike <- bike %>%
  mutate(
    weekday_fct = factor(x = weekday,
             levels = c(0,1,2,3,4,5,6),
             labels = c("Sunday",
                       "Monday",
                       "Tuesday",
                       "Wednesday",
                       "Thursday",
                       "Friday",
                       "Saturday")))

# verify factor variable
# bike %>% 
#   dplyr::count(weekday, weekday_fct) %>% 
#   tidyr::spread(weekday, n)


# Holidays ----
# test
# bike %>%
#   mutate(holiday_chr =
#       case_when(
#         holiday == 0 ~ "Non-Holiday",
#         holiday == 1 ~ "Holiday")) %>% 
#   dplyr::count(holiday, holiday_chr) %>%
#   tidyr::spread(holiday, n)

# assign
bike <- bike %>%
  mutate(holiday_chr =
      case_when(
        holiday == 0 ~ "Non-Holiday",
        holiday == 1 ~ "Holiday"))

# verify
# bike %>%
#   dplyr::count(holiday, holiday_chr) %>%
#   tidyr::spread(holiday, n)

# test
# bike %>%
#   mutate(
#     holiday_fct = factor(x = holiday,
#              levels = c(0,1),
#              labels = c("Non-Holiday",
#                        "Holiday"))) %>% 
#     dplyr::count(holiday, holiday_fct) %>%
#     tidyr::spread(holiday, n)

# assign
bike <- bike %>%
  mutate(
    holiday_fct = factor(x = holiday,
             levels = c(0,1),
             labels = c("Non-Holiday",
                       "Holiday")))

# # verify
# bike %>%
#   dplyr::count(holiday_chr, holiday_fct) %>%
#   tidyr::spread(holiday_chr, n)

# Working days ----
# test
 # bike %>%
 #  mutate(
 #    workingday_chr =
 #      case_when(
 #        workingday == 0 ~ "Non-Working Day",
 #        workingday == 1 ~ "Working Day",
 #        TRUE ~ "other")) %>% 
 #    dplyr::count(workingday, workingday_chr) %>%
 #    tidyr::spread(workingday, n)

# assign
 bike <- bike %>%
  mutate(
    workingday_chr =
      case_when(
        workingday == 0 ~ "Non-Working Day",
        workingday == 1 ~ "Working Day",
        TRUE ~ "other")) 
 
 # verify
 # bike %>% 
 #    dplyr::count(workingday, workingday_chr) %>%
 #    tidyr::spread(workingday, n)
   
# test
# bike %>%
#   mutate(
#     workingday_fct = factor(x = workingday,
#              levels = c(0,1),
#              labels = c("Non-Working Day",
#                        "Working Day"))) %>%
#   dplyr::count(workingday, workingday_fct) %>%
#   tidyr::spread(workingday, n)

# assign
bike <- bike %>%
  mutate(
    workingday_fct = factor(x = workingday,
             levels = c(0,1),
             labels = c("Non-Working Day",
                       "Working Day")))

# verify
# bike %>%
#   dplyr::count(workingday_chr, workingday_fct) %>%
#   tidyr::spread(workingday_chr, n)


# Seasons
bike <- bike %>%
  mutate(
    season_chr =
      case_when(
        season == 1 ~ "Spring",
        season == 2 ~ "Summer",
        season == 3 ~ "Fall",
        season == 4 ~ "Winter",
        TRUE ~ "other"
      ))

# test
# bike %>%
#   mutate(
#     season_fct = factor(x = season,
#              levels = c(1, 2, 3, 4),
#              labels = c("Spring",
#                        "Summer",
#                        "Fall",
#                        "Winter"))) %>%
#   dplyr::count(season_chr, season_fct) %>%
#   tidyr::spread(season_chr, n)

# assign
bike <- bike %>%
  mutate(
    season_fct = factor(x = season,
             levels = c(1, 2, 3, 4),
             labels = c("Spring",
                       "Summer",
                       "Fall",
                       "Winter"))) 

# verify
# bike %>%
#   dplyr::count(season_chr, season_fct) %>%
#   tidyr::spread(season_chr, n)


# Weather situation ----
# test
# bike %>%
#   mutate(
#     weathersit_chr =
#       case_when(
#         weathersit == 1 ~ "Good",
#         weathersit == 2 ~ "Clouds/Mist",
#         weathersit == 3 ~ "Rain/Snow/Storm",
#         TRUE ~ "other")) %>% 
#   dplyr::count(weathersit, weathersit_chr) %>%
#   tidyr::spread(weathersit, n)

# assign
bike <- bike %>%
  mutate(
    weathersit_chr =
      case_when(
        weathersit == 1 ~ "Good",
        weathersit == 2 ~ "Clouds/Mist",
        weathersit == 3 ~ "Rain/Snow/Storm"))

# verify
# bike %>% 
#   dplyr::count(weathersit, weathersit_chr) %>%
#   tidyr::spread(weathersit, n)

# test
# bike %>%
#   mutate(
#     weathersit_fct = factor(x = weathersit,
#              levels = c(1, 2, 3),
#              labels = c("Good",
#                        "Clouds/Mist",
#                        "Rain/Snow/Storm"))) %>%
#   dplyr::count(weathersit, weathersit_fct) %>%
#   tidyr::spread(weathersit, n)

# assign 
bike <- bike %>%
  mutate(
    weathersit_fct = factor(x = weathersit,
                       levels = c(1, 2, 3),
                       labels = c("Good",
                                 "Clouds/Mist",
                                 "Rain/Snow/Storm")))
# verify
# bike %>%
#   dplyr::count(weathersit_chr, weathersit_fct) %>%
#   tidyr::spread(weathersit_chr, n)


# Months ----
# huge shoutout to Thomas Mock over at RStudio for showing me 
# lubridate::month() (and stopping my case_when() obsession)
# https://twitter.com/thomas_mock/status/1113105497480183818

# test 
# bike %>% 
#   mutate(month_ord = 
#            lubridate::month(mnth, label = TRUE)) %>% 
#   dplyr::count(month_ord, mnth) %>% 
#   tidyr::spread(month_ord, n)

# assign
bike <- bike %>% 
  mutate(month_ord = 
           lubridate::month(mnth, label = TRUE))

# verify
# bike %>% 
#   dplyr::count(month_ord, mnth) %>% 
#   tidyr::spread(month_ord, n)
  

# test
# bike %>%
#   mutate(
#     month_fct = factor(x = mnth,
#              levels = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
#              labels = c("January", "February", "March", "April", "May",
#                         "June", "July", "August", "September", "October",
#                         "November", "December"))) %>%
#   dplyr::count(mnth, month_fct) %>%
#   tidyr::spread(month_fct, n)

# assign
bike <- bike %>%
  mutate(
    month_fct = factor(x = mnth,
             levels = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
             labels = c("January", "February", "March", "April", "May",
                        "June", "July", "August", "September", "October",
                        "November", "December")))

# verify
# bike %>% 
#   dplyr::count(month_chr, month_fct) %>%
#   tidyr::spread(month_fct, n)

# Year ----
# test
# bike %>%
#   mutate(
#     yr_chr =
#       case_when(
#         yr == 0 ~ "2011",
#         yr == 1 ~ "2012",
#         TRUE ~ "other")) %>% 
#     dplyr::count(yr, yr_chr) %>%
#     tidyr::spread(yr, n)

# assign
bike <- bike %>%
  mutate(
    yr_chr =
      case_when(
        yr == 0 ~ "2011",
        yr == 1 ~ "2012"))
# verify
# bike %>%
#     dplyr::count(yr, yr_chr) %>%
#     tidyr::spread(yr, n)

# test
# bike %>%
#   mutate(
#     yr_fct = factor(x = yr,
#              levels = c(0, 1),
#              labels = c("2011",
#                        "2012"))) %>%
#   dplyr::count(yr, yr_fct) %>%
#   tidyr::spread(yr, n)

# assign
bike <- bike %>%
  mutate(
    yr_fct = factor(x = yr,
             levels = c(0, 1),
             labels = c("2011",
                       "2012")))
# verify
# bike %>%
#   dplyr::count(yr_chr, yr_fct) %>%
#   tidyr::spread(yr_chr, n)

# normalize temperatures ----
bike <- bike %>%
  mutate(temp = as.integer(temp * (39 - (-8)) + (-8)))

bike <- bike %>%
  mutate(atemp = atemp * (50 - (16)) + (16))

# ~ windspeed ----
bike <- bike %>%
  mutate(windspeed = as.integer(67 * bike$windspeed))

# ~ humidity ----
bike <- bike %>%
  mutate(hum = as.integer(100 * bike$hum))

# ~ convert to date ----
bike <- bike %>%
  mutate(dteday = as.Date(dteday))

# check df
# bike %>% dplyr::glimpse(78)

# rename the data frame so these don't get confused
BikeData <- bike

# reorganize variables for easier inspection

BikeData <- BikeData %>% 
  dplyr::select(
    dplyr::starts_with("week"),
    dplyr::starts_with("holi"),
    dplyr::starts_with("seas"),
    dplyr::starts_with("work"),
    dplyr::starts_with("month"),
    dplyr::starts_with("yr"),
    dplyr::starts_with("weath"),
    dplyr::everything())

If you then dplyr::glimpse(BikeData) you should have the following:

Observations: 731
Variables: 30
$ weekday        <int> 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2, 3, 4, 5, 6, 0, 1, 2…
$ weekday_chr    <chr> "Saturday", "Sunday", "Monday", "Tuesday", "Wednesda…
$ weekday_fct    <fct> Saturday, Sunday, Monday, Tuesday, Wednesday, Thursd…
$ holiday        <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0…
$ holiday_chr    <chr> "Non-Holiday", "Non-Holiday", "Non-Holiday", "Non-Ho…
$ holiday_fct    <fct> Non-Holiday, Non-Holiday, Non-Holiday, Non-Holiday, …
$ season         <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ season_chr     <chr> "Spring", "Spring", "Spring", "Spring", "Spring", "S…
$ season_fct     <fct> Spring, Spring, Spring, Spring, Spring, Spring, Spri…
$ workingday     <int> 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1…
$ workingday_chr <chr> "Non-Working Day", "Non-Working Day", "Working Day",…
$ workingday_fct <fct> Non-Working Day, Non-Working Day, Working Day, Worki…
$ month_ord      <ord> Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Jan, Ja…
$ month_fct      <fct> January, January, January, January, January, January…
$ yr             <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ yr_chr         <chr> "2011", "2011", "2011", "2011", "2011", "2011", "201…
$ yr_fct         <fct> 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011, 2011…
$ weathersit     <int> 2, 2, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2…
$ weathersit_chr <chr> "Clouds/Mist", "Clouds/Mist", "Good", "Good", "Good"…
$ weathersit_fct <fct> Clouds/Mist, Clouds/Mist, Good, Good, Good, Good, Cl…
$ instant        <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 1…
$ dteday         <date> 2011-01-01, 2011-01-02, 2011-01-03, 2011-01-04, 201…
$ mnth           <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ temp           <int> 8, 9, 1, 1, 2, 1, 1, 0, -1, 0, 0, 0, 0, 0, 2, 2, 0, …
$ atemp          <dbl> 28.36325, 28.02713, 22.43977, 23.21215, 23.79518, 23…
$ hum            <int> 80, 69, 43, 59, 43, 51, 49, 53, 43, 48, 68, 59, 47, …
$ windspeed      <int> 10, 16, 16, 10, 12, 6, 11, 17, 24, 14, 8, 20, 20, 8,…
$ casual         <int> 331, 131, 120, 108, 82, 88, 148, 68, 54, 41, 43, 25,…
$ registered     <int> 654, 670, 1229, 1454, 1518, 1518, 1362, 891, 768, 12…
$ cnt            <int> 985, 801, 1349, 1562, 1600, 1606, 1510, 959, 822, 13…

Exploratory data analysis

Here are three options for summarizing the bike table into summary statistics that will give us a better understanding of the underlying distribution for each variable in the BikeData data frame. First, we’ll use the dplyr package.

BikeDplyrSummary <- BikeData %>%
  select(temp, atemp, hum, windspeed, casual, registered, cnt) %>%
  summarise_each(list(
    min = ~min,
    q25 = ~quantile(., 0.25),
    median = ~median,
    q75 = ~quantile(., 0.75),
    max = ~max,
    mean = ~mean,
    sd = ~sd
  )) %>%
  gather(stat, val) %>%
  separate(stat, 
           into = c("var", "stat"), 
           sep = "_") %>%
  spread(stat, val) %>%
  select(var, min, q25, median, q75, max, mean, sd)

knitr::kable(BikeDplyrSummary)

var	min	q25	median	q75	max	mean	sd
atemp	18.68837	27.48664	32.54892	36.69247	44.59046	32.12804	5.540680
casual	2.00000	315.50000	713.00000	1096.00000	3410.00000	848.17647	686.622488
cnt	22.00000	3152.00000	4548.00000	5956.00000	8714.00000	4504.34884	1937.211452
hum	0.00000	52.00000	62.00000	73.00000	97.00000	62.30643	14.242776
registered	20.00000	2497.00000	3662.00000	4776.50000	6946.00000	3656.17237	1560.256377
temp	-5.00000	7.00000	15.00000	22.00000	32.00000	14.79617	8.546779
windspeed	1.00000	9.00000	12.00000	15.00000	34.00000	12.26265	5.190293

DON’T MISS Sky-high expectations for The New York Times, but Empire State data visualization falls short

Next, we’ll use the skimr package (which can be found here).

BikeSkimrSummary <- bike %>%
  skimr::skim_to_wide() %>%
  dplyr::select(type,
    variable,
    missing,
    complete,
    min,
    max,
    mean,
    sd,
    median = p50,
    hist)
knitr::kable(BikeSkimrSummary)

type	variable	complete	min	max	mean	sd	median	hist
character	holiday_chr	731	7	11	NA	NA	NA	NA
character	month_chr	731	3	9	NA	NA	NA	NA
character	season_chr	731	4	6	NA	NA	NA	NA
character	weathersit_chr	731	4	15	NA	NA	NA	NA
character	weekday_chr	731	6	9	NA	NA	NA	NA
character	workingday_chr	731	11	15	NA	NA	NA	NA
character	yr_chr	731	4	4	NA	NA	NA	NA
Date	dteday	731	2011-01-01	2012-12-31	NA	NA	NA	NA
factor	holiday_fct	731	NA	NA	NA	NA	NA	NA
factor	month_fct	731	NA	NA	NA	NA	NA	NA
factor	season_fct	731	NA	NA	NA	NA	NA	NA
factor	weathersit_fct	731	NA	NA	NA	NA	NA	NA
factor	weekday_fct	731	NA	NA	NA	NA	NA	NA
factor	workingday_fct	731	NA	NA	NA	NA	NA	NA
factor	yr_fct	731	NA	NA	NA	NA	NA	NA
integer	casual	731	NA	NA	848.18	686.62	713	▇▇▅▂▁▁▁▁
integer	cnt	731	NA	NA	4504.35	1937.21	4548	▂▅▅▇▇▅▅▂
integer	holiday	731	NA	NA	0.029	0.17	0	▇▁▁▁▁▁▁▁
integer	hum	731	NA	NA	62.31	14.24	62	▁▁▁▃▇▇▅▂
integer	instant	731	NA	NA	366	211.17	366	▇▇▇▇▇▇▇▇
integer	mnth	731	NA	NA	6.52	3.45	7	▇▅▇▃▅▇▅▇
integer	registered	731	NA	NA	3656.17	1560.26	3662	▁▅▅▆▇▅▃▃
integer	season	731	NA	NA	2.5	1.11	3	▇▁▇▁▁▇▁▇
integer	temp	731	NA	NA	14.8	8.55	15	▁▅▇▇▆▆▇▂
integer	weathersit	731	NA	NA	1.4	0.54	1	▇▁▁▅▁▁▁▁
integer	weekday	731	NA	NA	3	2	3	▇▇▇▇▁▇▇▇
integer	windspeed	731	NA	NA	12.26	5.19	12	▂▇▇▆▂▁▁▁
integer	workingday	731	NA	NA	0.68	0.47	1	▃▁▁▁▁▁▁▇
integer	yr	731	NA	NA	0.5	0.5	1	▇▁▁▁▁▁▁▇
numeric	atemp	731	NA	NA	32.13	5.54	32.55	▁▅▇▇▇▇▆▁

DON’T MISS How to scrape Reddit with Python

Finally, we’ll use the mosaic package (which can be found here).

BikeMosaicInspect <- mosaic::inspect(BikeData)
# categorical
knitr::kable(BikeMosaicInspect$categorical)

name	class	levels	n	distribution
weekday_chr	character	7	731	Monday (14.4%), Saturday (14.4%) …
weekday_fct	factor	7	731	Sunday (14.4%), Monday (14.4%) …
holiday_chr	character	2	731	Non-Holiday (97.1%), Holiday (2.9%)
holiday_fct	factor	2	731	Non-Holiday (97.1%), Holiday (2.9%)
workingday_chr	character	2	731	Working Day (68.4%) …
workingday_fct	factor	2	731	Working Day (68.4%) …
season_chr	character	4	731	Fall (25.7%), Summer (25.2%) …
season_fct	factor	4	731	Fall (25.7%), Summer (25.2%) …
weathersit_chr	character	3	731	Good (63.3%), Clouds/Mist (33.8%) …
weathersit_fct	factor	3	731	Good (63.3%), Clouds/Mist (33.8%) …
month_chr	character	12	731	August (8.5%), December (8.5%) …
month_fct	factor	12	731	January (8.5%), March (8.5%) …
yr_chr	character	2	731	2012 (50.1%), 2011 (49.9%)
yr_fct	factor	2	731	2012 (50.1%), 2011 (49.9%)

# date
knitr::kable(BikeMosaicInspect$Date)

name	class	first	last	min_diff	max_diff	n	missing
dteday	Date	2011-01-01	2012-12-31	1 days	1 days	731	0

# quantitative
knitr::kable(BikeMosaicInspect$quantitative)

name	class	min	Q1	median	Q3	max	mean	sd	n
instant	integer	1.00000	183.50000	366.00000	548.50000	731.00000	366.0000000	211.1658116	731
season	integer	1.00000	2.00000	3.00000	3.00000	4.00000	2.4965800	1.1108071	731
yr	integer	0.00000	0.00000	1.00000	1.00000	1.00000	0.5006840	0.5003419	731
mnth	integer	1.00000	4.00000	7.00000	10.00000	12.00000	6.5198358	3.4519128	731
holiday	integer	0.00000	0.00000	0.00000	0.00000	1.00000	0.0287278	0.1671547	731
weekday	integer	0.00000	1.00000	3.00000	5.00000	6.00000	2.9972640	2.0047869	731
workingday	integer	0.00000	0.00000	1.00000	1.00000	1.00000	0.6839945	0.4652334	731
weathersit	integer	1.00000	1.00000	1.00000	2.00000	3.00000	1.3953488	0.5448943	731
temp	integer	-5.00000	7.00000	15.00000	22.00000	32.00000	14.7961696	8.5467794	731
atemp	numeric	18.68837	27.48664	32.54892	36.69247	44.59046	32.1280356	5.5406801	731
hum	integer	0.00000	52.00000	62.00000	73.00000	97.00000	62.3064295	14.2427756	731
windspeed	integer	1.00000	9.00000	12.00000	15.00000	34.00000	12.2626539	5.1902926	731
casual	integer	2.00000	315.50000	713.00000	1096.00000	3410.00000	848.1764706	686.6224883	731
registered	integer	20.00000	2497.00000	3662.00000	4776.50000	6946.00000	3656.1723666	1560.2563770	731
cnt	integer	22.00000	3152.00000	4548.00000	5956.00000	8714.00000	4504.3488372	1937.2114516	731

Exploring the impact of weather conditions on bike rentals

Unlike ride-share passengers, bikers are vulnerable to weather conditions, and this might impact their likelihood of choosing a bike over other transportation options. If weather conditions are influential in transportation decisions, we would expect to see relationships between the number of bike rentals and weather features including temperature, humidity and wind speed. Let’s explore.

# ~ rentals by temperature ----
ggRentalsByTemp <- BikeData %>% 
  ggplot(aes(y = cnt, 
                 x = temp, 
                 color = weekday_fct)) +
  geom_point(show.legend = FALSE) +
  geom_smooth(se = FALSE,
              show.legend = FALSE) +
  facet_grid(~weekday_fct) +
  scale_color_brewer(palette = "Dark2") +
  theme_fivethirtyeight() +
  theme(axis.title = element_text()) +
  ylab("Bike Rentals") +
  xlab("Temperature (°C)") +
  ggtitle("Bike Rental Volume By Temperature")
ggRentalsByTemp

The output below is R’s way of telling us how the best-fit line is being drawn through each set of data. This fits a LOESS local polynomial regression to the data.

`geom_smooth()` using method = 'loess' and formula 'y ~ x'

We would also expect windier conditions to negatively impact bike rentals. Let’s analyze the data.

# ggRentalVolByWindSpeed ----
ggRentalVolByWindSpeed <- ggplot(bike) +
  geom_point(aes(y = cnt, 
                 x = windspeed, 
                 color = weekday_fct),
             show.legend = FALSE) +
  facet_grid(~weekday_fct) +
  scale_color_brewer(palette = "Dark2") +
  theme_fivethirtyeight() +
  theme(axis.title = element_text()) +
  ylab("Bike Rentals") +
  xlab("Windspeed") +
  ggtitle("Rental Volume By Windspeed")
ggRentalVolByWindSpeed

Exploring the impact of holidays on bike rental volume

Holidays might influence bike riders in different ways. For instance, we can think of holidays as increasing opportunities for bike riders to enjoy being on the road with fewer drivers, since typically fewer people drive on holidays. We could also consider a situation where bike enthusiasts only prefer to ride their bikes on summer or spring holidays (considering the information we’ve learned about the influences of weather conditions on bike rentals).

ggRentalVolByHoliday <- ggplot(BikeData) +
  geom_density(aes(x = cnt,
                   fill = holiday_chr), 
                   alpha = 0.2) +
  scale_fill_brewer(palette = "Paired") +
  
  theme_fivethirtyeight() +
  theme(axis.title = element_text()) + 
  labs(title = "Bike Rental Density By Holiday",
               fill = "Holiday",
               x = "Average Bike Rentals",
               y = "Density")

ggRentalVolByHoliday

With both Lyft and Uber preparing for IPOs this year and total rideshares for each company hitting one billion and 10 billion rides, respectively, the pace of transportation disruption in cities across the country seems to be increasing. How this evolution will impact greener forms of transportation like bikes will in part be determined by the ability of companies and cities to plan and execute on accommodating demand. Learning to forecast this shift will be an important skill for most everyone involved.

Edit 2019-04-03: The code for this post is available here on Github.

Author
Recent Posts

Peter Spangler

Peter is an entrepreneurial minded data scientist whose expertise in building analytic solutions and insights to tell data driven stories have impacted organizations including Alibaba, Citrix and Lyft. Peter has lead experimentation design and analytics projects focused on retention, user acquisition and channel optimization in the SaaS and rideshare spaces. The solutions he has produced include incrementality testing, segmentation, ML models and fraud. Peter is a passionate advocate for building analytics teams in cross-functional environments.

Latest posts by Peter Spangler (see all)

How to model with gradient boosting machine in R - April 9, 2019
Exploring bike rental behavior using R - April 2, 2019

2 thoughts on “Exploring bike rental behavior using R”

Bonnie Walker says:

April 3, 2019 at 1:25 pm

Fantastic!

ALewe says:

September 23, 2019 at 7:11 am

Hi, fantastic work. I have one question though: When I try to run your code I get the following error:
Error: Column `temp_min` is of unsupported type function.
I can work around that with unclass() first, but then I get a second error message:

message: `select()` doesn’t handle lists.
class: `rlang_error`
backtrace:
1. base::lapply(., unclass)
9. dplyr::select(…)
Call `rlang::last_trace()` to see the full backtrace
> rlang::last_trace()
x
1. \-`%>%`(…)
2. +-base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
3. \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
4. \-base::eval(quote(`_fseq`(`_lhs`)), env, env)
5. \-`_fseq`(`_lhs`)
6. \-magrittr::freduce(value, `_function_list`)
7. \-function_list[[i]](value)
8. +-dplyr::select(…)
9. \-dplyr:::select.list(…)

Is this a problem with different versions of R?

Storybench

2 thoughts on “Exploring bike rental behavior using R”

Leave a Reply Cancel reply

Get the latest from Storybench

Exploring bike rental behavior using R

Load packages

Import the data

Wrangling the data

Exploratory data analysis

Exploring the impact of weather conditions on bike rentals

Exploring the impact of holidays on bike rental volume

2 thoughts on “Exploring bike rental behavior using R”

Leave a Reply Cancel reply

read more

I Used ChatGPT as a Reporting Assistant. It Didn’t Go Well

Using R and Media Cloud to do sentiment analysis

Get the latest from Storybench