How to geocode a csv of addresses in R

How to
Share on FacebookShare on Google+Tweet about this on TwitterPin on PinterestShare on LinkedInEmail this to someone

Sometimes all you want are some lat, long coordinates to map your data. The following tutorial shows you how to batch geocode a csv of addresses in R using the ggmap package, which asks Google to geocode the addresses using its API. FYI, the limit is 2,500 queries per day.

Not familiar with R or R Studio? See this short tutorial on Storybench. Big thanks to Mitchell Craver for his tutorial and script published last year on his blog What do the data say?

Download the ggmap package in R Studio

We’ll need ggmap, a spatial visualization package, to geocode the csv. To install it in R Studio, open a new R script in “File” > “New File” > “R Script.” Type install.packages(‘ggmap’) on line 1 of the top-left pane. Click “Run” or hit Shift-Command-Return. You should see the package downloading and installing in the console pane.

Prepare your csv of addresses

We’ll be using a csv with the names, addresses and other information for 114 craft breweries in Massachusetts. Make sure to note the name – “addresses of the column listing the addresses. You’ll get the best results if town and state are all listed in that column, too.

Copy over the R script

In R Studio, open a new R script in “File” > “New File” > “R Script.” Scroll down and copy the R script from the Gist or click here to see it on Github. Keep scrolling for an explanation of what each line does.

Understanding the R script

First, we load the ggmap library.

library(ggmap)

Next, we’ll add a line with file.choose that opens a file selection box and allows you to search through your computer to find the csv.

fileToLoad <- file.choose(new = TRUE)

Next, R will read the csv and store it as the variable origAddress.

origAddress <- read.csv(fileToLoad, stringsAsFactors = FALSE)

Next, the script initializes a data frame to store the geocoded addresses while the loop, which starts with for(i in 1:nrow(origAddress)), processes the rest. For more on loops in R, go here.

Now, look through the whole script again.

Important: Notice the $addresses[i] within ggmap's geocode function.

geocode(origAddress$addresses[i], output = "latlona", source = "google")

This is where you store the name of your addresses column. (More documentation on how to use geocode here and here.)

After the loop runs, you'll want to write all the data to a csv named geocoded.csv.

write.csv(origAddress, "geocoded.csv", row.names=FALSE)

Run the script

Click "Run" or hit Shift-Command-Return. You should see each row being passed to Google's API in the console pane. For 114 rows, it took around 30 seconds.

Check the geocoded csv

Your new geocoded spreadsheet should contain three appended columns – lon, lat, and geoAddress. Sweet.

Now, to map the data

We'll use Carto.com and upload geocoded.csv in the Connect dataset section. The data – with the lat, lon recognized by Carto and stored as the_geom – should look like this:

Next, click Create Map on the bottom-right. Voilá:

Storybench’s editor is Aleszu Bajak, a science journalist and former Knight Science Journalism Fellow at MIT. He is an alum of Science Friday, the founder of LatinAmericanScience.org, and is passionate about breaking down the divide between journalists, developers and designers.

Leave a Reply