How to geocode a csv of addresses in R

How to
Share on FacebookShare on Google+Tweet about this on TwitterPin on PinterestShare on LinkedInEmail this to someone

Sometimes all you want are some lat, long coordinates to map your data. The following tutorial shows you how to batch geocode a csv of addresses in R using the ggmap package, which asks Google to geocode the addresses using its API. FYI, the limit is 2,500 queries per day.

Not familiar with R or R Studio? See this short tutorial on Storybench. Big thanks to Mitchell Craver for his tutorial and script published last year on his blog What do the data say?

Download the ggmap package in R Studio

We’ll need ggmap, a spatial visualization package, to geocode the csv. To install it in R Studio, open a new R script in “File” > “New File” > “R Script.” Type install.packages(‘ggmap’) on line 1 of the top-left pane. Click “Run” or hit Shift-Command-Return. You should see the package downloading and installing in the console pane.

Prepare your csv of addresses

We’ll be using a csv with the names, addresses and other information for 114 craft breweries in Massachusetts. Make sure to note the name – “addresses of the column listing the addresses. You’ll get the best results if town and state are all listed in that column, too.

Copy over the R script

In R Studio, open a new R script in “File” > “New File” > “R Script.” Scroll down and copy the R script from the Gist or click here to see it on Github. Keep scrolling for an explanation of what each line does.

Understanding the R script

First, we load the ggmap library.


Next, we’ll add a line with file.choose that opens a file selection box and allows you to search through your computer to find the csv.

fileToLoad <- file.choose(new = TRUE)

Next, R will read the csv and store it as the variable origAddress.

origAddress <- read.csv(fileToLoad, stringsAsFactors = FALSE)

Next, the script initializes a data frame to store the geocoded addresses while the loop, which starts with for(i in 1:nrow(origAddress)), processes the rest. For more on loops in R, go here.

Now, look through the whole script again.

Important: Notice the $addresses[i] within ggmap's geocode function.

geocode(origAddress$addresses[i], output = "latlona", source = "google")

This is where you store the name of your addresses column. (More documentation on how to use geocode here and here.)

After the loop runs, you'll want to write all the data to a csv named geocoded.csv.

write.csv(origAddress, "geocoded.csv", row.names=FALSE)

Run the script

Click "Run" or hit Shift-Command-Return. You should see each row being passed to Google's API in the console pane. For 114 rows, it took around 30 seconds.

Check the geocoded csv

Your new geocoded spreadsheet should contain three appended columns – lon, lat, and geoAddress. Sweet.

Now, to map the data

We'll use and upload geocoded.csv in the Connect dataset section. The data – with the lat, lon recognized by Carto and stored as the_geom – should look like this:

Next, click Create Map on the bottom-right. Voilá:

Storybench’s editor is Aleszu Bajak, a science journalist and former Knight Science Journalism Fellow at MIT. He is an alum of Science Friday, the founder of, and is passionate about breaking down the divide between journalists, developers and designers.


  • The code is interrupted when an address can’t be coded. How can I modify the code so it shows “not found” and proceeds onto the next address?

  • I apply Argis software to geocode a list of addresses. after geocoding, my result has X, and Y which shows X=2220619.25786 and Y=301107.963424. But my other file has the same address it show Lat and Lon number is different. so how can I convert X,Y number to the Lon and Lat ordinance.

  • Hi, I may need your help.

    I ran the same code but it seems like the loop fails if it cannot find the location.

    I got geocode failed with status OVER_QUERY_LIMIT, location = “2000 PEPPERELL PARKWAY OPELIKA AL”

    is there any way that I could ignore and keep the code move to the next address?

    Thank you.

  • Hi this is my first script that I’ve run with R and my output is showing NA in the Lon, Lat, and GeoAddress column. Any idea what went wrong? My address field is set up like the example, and my header is the same.

Leave a Reply