Tutorials

How do you use the Census API to pinpoint data?

This article was originally published as an Observable notebook.

Searching and parsing variables from the American Community Survey (and other datasets)

The Census Data API is an incredible resource that makes a huge universe of data available programmatically. However, it can be hard to find the exact variables you need for your query.

Also, commonly reported statistics are usually aggregates of several variables in, say, the American Community Survey. For instance, if you want to find out the percentage of people in a county who did not finish high school, you need to add up a bunch of different variables and divide that value by the group/concept total.

The Census Data Explorer is handy as a starting point for finding variable families. In my case, since I am working off of this project, I already know that I need data from family “B15003 —EDUCATIONAL ATTAINMENT FOR THE POPULATION 25 YEARS AND OVER.”

But I still need a way to search and extract variable codes and labels to aggregate for calculating certain metrics. I could copy and paste from the 2019 ACS variable list, but I’d like to avoid introducing errors through copying and pasting and I want to format the labels a bit. Also I’m doing my analysis in Stata, so I want to generate some statements for loading and labeling the variables.

So, let’s get started by specifying a dataset from the list of datasets at the Census website. We want the API Base URL for the 2019 five-year ACS.

You can edit the text box to add another dataset from the list of Census datasets. Though, I have only used this with recent ACS datasets, so you may need to modify the code — especially the code that parses the labels for text output.

DON’T MISS  How to build a GIF of satellite imagery in R

First, let’s load the json of variables and format as a tabular dataset.

Now, we can search. For instance, in the default ACS dataset to this notebook, we can try searching for “educational attainment” or “B15003.”

The results of the search appear in this table. You can use the checkboxes on the left to select the variables you actually want. You can even hit “SHIFT + CLICK” to select a range of variables. (Note that you can hover over elements in the table to get the full text when truncated.)

Creating custom strings

Now, what? Well, I needed to create a bunch of label statements to label these otherwise incomprehensible codes in Stata, so I wrapped these along with the code for loading variables into a text file that you can download. (See “Download .do” file button below.)

However, if you don’t use Stata you might need some to generate some other strings. For instance, you might want to create a URL for loading the data from the Census. If you want to, say, load county data for North Carolina from the 2019 5 year ACS. The query looks like this: https://api.census.gov/data/2019/acs/acs5?get=B01001_001E,B11003_001E&for=county:*&in=state:37

And the same query with all variables you selected from the table can be seen in the original link.

Note that depending on the size and frequency of requests you may need to sign up for an API key and append it to your query, like &key={yourAPIkeyhere}.

Download Stata .do

You can visit the original link to download a Stata .do file that uses the censusapi package to download data.

With these methods, you can refine your queries to fish out the specific information you need in the vast sea of census data.

DON’T MISS  How to do basic text mining using Google Sheets
Evan Galloway
Latest posts by Evan Galloway (see all)

Leave a Reply

Your email address will not be published. Required fields are marked *

Get the latest from Storybench

Keep up with tutorials, behind-the-scenes interviews and more.