How to geocode a csv of addresses in R
Sometimes all you want are some lat, long coordinates to map your data. The following tutorial shows you how to batch geocode a csv of addresses in R using the ggmap package, which asks Google to geocode the addresses using its API. FYI, the limit is 2,500 queries per day.
Not familiar with R or R Studio? See this short tutorial on Storybench. Big thanks to Mitchell Craver for his tutorial and script published last year on his blog What do the data say?
Download the ggmap package in R Studio
We’ll need ggmap, a spatial visualization package, to geocode the csv. To install it in R Studio, open a new R script in “File” > “New File” > “R Script.” Type install.packages(‘ggmap’) on line 1 of the top-left pane. Click “Run” or hit Shift-Command-Return. You should see the package downloading and installing in the console pane.
Prepare your csv of addresses
We’ll be using a csv with the names, addresses and other information for 114 craft breweries in Massachusetts. Make sure to note the name – “addresses“ – of the column listing the addresses. You’ll get the best results if town and state are all listed in that column, too.
Copy over the R script
In R Studio, open a new R script in “File” > “New File” > “R Script.” Scroll down and copy the R script from the Gist or click here to see it on Github. Keep scrolling for an explanation of what each line does.
Understanding the R script
First, we load the ggmap library.
library(ggmap)
Next, we’ll add a line with file.choose that opens a file selection box and allows you to search through your computer to find the csv.
fileToLoad <- file.choose(new = TRUE)
Next, R will read the csv and store it as the variable origAddress.
origAddress <- read.csv(fileToLoad, stringsAsFactors = FALSE)
Next, the script initializes a data frame to store the geocoded addresses while the loop, which starts with for(i in 1:nrow(origAddress)), processes the rest. For more on loops in R, go here.
Now, look through the whole script again.
Important: Notice the $addresses[i] within ggmap’s geocode function.
geocode(origAddress$addresses[i], output = "latlona", source = "google")
This is where you store the name of your addresses column. (More documentation on how to use geocode here and here.)
After the loop runs, you’ll want to write all the data to a csv named geocoded.csv.
write.csv(origAddress, "geocoded.csv", row.names=FALSE)
Run the script
Click “Run” or hit Shift-Command-Return. You should see each row being passed to Google’s API in the console pane. For 114 rows, it took around 30 seconds.
Check the geocoded csv
Your new geocoded spreadsheet should contain three appended columns – lon, lat, and geoAddress. Sweet.
Now, to map the data
We’ll use Carto.com and upload geocoded.csv in the Connect dataset section. The data – with the lat, lon recognized by Carto and stored as the_geom – should look like this:
Next, click Create Map on the bottom-right. Voilá:
Hi – thanks for the post its really help. However, how can I include my Google Maps Geocoding API key?
The code is interrupted when an address can’t be coded. How can I modify the code so it shows “not found” and proceeds onto the next address?
Did you find the way? Im having the same trouble
Hi, I’m having the same issue! Did you find the solution?
Xiao
Thanks, this is so useful! Is there a way to replace the for loop with a function from the plyr package to improve runtime?
I apply Argis software to geocode a list of addresses. after geocoding, my result has X, and Y which shows X=2220619.25786 and Y=301107.963424. But my other file has the same address it show Lat and Lon number is different. so how can I convert X,Y number to the Lon and Lat ordinance.
Hi, I may need your help.
I ran the same code but it seems like the loop fails if it cannot find the location.
I got geocode failed with status OVER_QUERY_LIMIT, location = “2000 PEPPERELL PARKWAY OPELIKA AL”
messagge.
is there any way that I could ignore and keep the code move to the next address?
Thank you.
Same thing happened to me. How do I get around that? There are only 69 addresses on my list.
is there any different geocoding place except google ?
Hi this is my first script that I’ve run with R and my output is showing NA in the Lon, Lat, and GeoAddress column. Any idea what went wrong? My address field is set up like the example, and my header is the same.
Hi,
Thank you for information.
But I have one more question: if I have google coordinates in my csv file such 56°29’39.7″N 22°03’52.4″E
What would be the script to convert into lat and long?
Add tryCatch to not have problems with errors.
Thanks for the coding!
for(i in 1:nrow(origAddress))
{
tryCatch({
# Print(“Working…”)
result <- geocode(origAddress$addresses[i], output = "latlona", source = "google")
origAddress$lon[i] <- as.numeric(result[1])
origAddress$lat[i] <- as.numeric(result[2])
origAddress$geoAddress[i] <- as.character(result[3])
}, error=function(e){cat("Warning:",conditionMessage(e), "\n")})
}
Hi. Aleszu, this is so helpful! Thank you and God bless.
Google now requires an API, which I can see folks know from comments above. One issue I ran into was the code stopping when it encountered an error. I searched the web and found out that if any address has the “#” character in it, the Google API will fail to run in R. I removed the “#” character from my data and now the steps above work. Thankful!
Many necessary processes are included in data science. The programming languages are one of the key components around which data science is built.data analytics courses malaysia