Tutorials

How to geocode a csv of addresses in R

Sometimes all you want are some lat, long coordinates to map your data. The following tutorial shows you how to batch geocode a csv of addresses in R using the ggmap package, which asks Google to geocode the addresses using its API. FYI, the limit is 2,500 queries per day.

Not familiar with R or R Studio? See this short tutorial on Storybench. Big thanks to Mitchell Craver for his tutorial and script published last year on his blog What do the data say?

Download the ggmap package in R Studio

We’ll need ggmap, a spatial visualization package, to geocode the csv. To install it in R Studio, open a new R script in “File” > “New File” > “R Script.” Type install.packages(‘ggmap’) on line 1 of the top-left pane. Click “Run” or hit Shift-Command-Return. You should see the package downloading and installing in the console pane.

Prepare your csv of addresses

We’ll be using a csv with the names, addresses and other information for 114 craft breweries in Massachusetts. Make sure to note the name – “addresses of the column listing the addresses. You’ll get the best results if town and state are all listed in that column, too.

Copy over the R script

In R Studio, open a new R script in “File” > “New File” > “R Script.” Scroll down and copy the R script from the Gist or click here to see it on Github. Keep scrolling for an explanation of what each line does.

Understanding the R script

First, we load the ggmap library.

library(ggmap)

Next, we’ll add a line with file.choose that opens a file selection box and allows you to search through your computer to find the csv.

fileToLoad <- file.choose(new = TRUE)

Next, R will read the csv and store it as the variable origAddress.

origAddress <- read.csv(fileToLoad, stringsAsFactors = FALSE)

Next, the script initializes a data frame to store the geocoded addresses while the loop, which starts with for(i in 1:nrow(origAddress)), processes the rest. For more on loops in R, go here.

DON’T MISS  Getting Started With R in RStudio Notebooks

Now, look through the whole script again.

Important: Notice the $addresses[i] within ggmap’s geocode function.

geocode(origAddress$addresses[i], output = "latlona", source = "google")

This is where you store the name of your addresses column. (More documentation on how to use geocode here and here.)

After the loop runs, you’ll want to write all the data to a csv named geocoded.csv.

write.csv(origAddress, "geocoded.csv", row.names=FALSE)

Run the script

Click “Run” or hit Shift-Command-Return. You should see each row being passed to Google’s API in the console pane. For 114 rows, it took around 30 seconds.

Check the geocoded csv

Your new geocoded spreadsheet should contain three appended columns – lon, lat, and geoAddress. Sweet.

Now, to map the data

We’ll use Carto.com and upload geocoded.csv in the Connect dataset section. The data – with the lat, lon recognized by Carto and stored as the_geom – should look like this:

Next, click Create Map on the bottom-right. Voilá:

Aleszu Bajak
Latest posts by Aleszu Bajak (see all)
DON’T MISS  What journalists need to know about code

14 thoughts on “How to geocode a csv of addresses in R

  1. The code is interrupted when an address can’t be coded. How can I modify the code so it shows “not found” and proceeds onto the next address?

  2. Thanks, this is so useful! Is there a way to replace the for loop with a function from the plyr package to improve runtime?

  3. I apply Argis software to geocode a list of addresses. after geocoding, my result has X, and Y which shows X=2220619.25786 and Y=301107.963424. But my other file has the same address it show Lat and Lon number is different. so how can I convert X,Y number to the Lon and Lat ordinance.

  4. Hi, I may need your help.

    I ran the same code but it seems like the loop fails if it cannot find the location.

    I got geocode failed with status OVER_QUERY_LIMIT, location = “2000 PEPPERELL PARKWAY OPELIKA AL”
    messagge.

    is there any way that I could ignore and keep the code move to the next address?

    Thank you.

    1. Same thing happened to me. How do I get around that? There are only 69 addresses on my list.

  5. Hi this is my first script that I’ve run with R and my output is showing NA in the Lon, Lat, and GeoAddress column. Any idea what went wrong? My address field is set up like the example, and my header is the same.

  6. Hi,

    Thank you for information.
    But I have one more question: if I have google coordinates in my csv file such 56°29’39.7″N 22°03’52.4″E
    What would be the script to convert into lat and long?

  7. Add tryCatch to not have problems with errors.
    Thanks for the coding!

    for(i in 1:nrow(origAddress))
    {
    tryCatch({
    # Print(“Working…”)
    result <- geocode(origAddress$addresses[i], output = "latlona", source = "google")
    origAddress$lon[i] <- as.numeric(result[1])
    origAddress$lat[i] <- as.numeric(result[2])
    origAddress$geoAddress[i] <- as.character(result[3])
    }, error=function(e){cat("Warning:",conditionMessage(e), "\n")})
    }

  8. Hi. Aleszu, this is so helpful! Thank you and God bless.

    Google now requires an API, which I can see folks know from comments above. One issue I ran into was the code stopping when it encountered an error. I searched the web and found out that if any address has the “#” character in it, the Google API will fail to run in R. I removed the “#” character from my data and now the steps above work. Thankful!

Leave a Reply to Chris Cancel reply

Your email address will not be published. Required fields are marked *

Get the latest from Storybench

Keep up with tutorials, behind-the-scenes interviews and more.