Finding and downloading data

Where to go to download ACS data.

Open Data Portal

The first place to look for ACS data for San Francisco is the Open Data Portal. DataSF calculates and publishes San Francisco ACS data tables along with San Francisco neighborhood data on the Portal. Checking the Open Data Portal first is important because the ACS tables on the portal are pre-formatted for analyst use and could save you a lot of time.

The datasets on the Open Data Portal include key statistical estimates like margins of error (MOEs) and coefficients of variation (CVs). These tables are also standard, easy to cite/link to, and consistent with other departments and City publications. Lastly, connecting to the Open Data Portal is easy and can save a lot of time.

For any table on the Portal, read the metadata and background information closely to ensure the data elements are what you are looking for. If you can’t find the table you are looking for, or if you want to add estimates that are not reported, and think it would be helpful to other analysts, please reach out to support@datasf.org to request the table be added.

Data tables from data.census.gov

Beyond the Open Data Portal, data.census.gov is your go-to site for data from the Census Bureau. If you can’t find a specific table on the Open Data Portal, you will search for it on data.census.gov. Familiarize yourself with the search interface and how to filter data and look at different tables. This portal contains data from the ACS, the decennial census, and more.

Below is a screenshot of the Census data website after searching for “education”. You can see the tables you could select from in the long list of tables. Currently selected is the “S1501” table from the 2019 1-year ACS estimates. You could choose a different ACS version using the down carrot button. You can also filter the data for San Francisco using the filter menu to the left.

Note that the Census Bureau frequently updates this data.census.gov interface. These screenshots are as of May 2022.

Screenshot of the data.census.gov interface with several functionalities highlighted: the top search bar; the filter menu to the left where you can filter by geography; the results pane with different tables related to the search topic; and the table viewer on the right where you can see the table and change the survey year shown.

When you have the table filtered and on the current ACS version you’d like, click download to get a CSV file containing the data table.

Screenshot of the data.census.gov interface with the download button in the top right highlighted. The download options are shown, with the CSV file type highlighted.

Microdata

Always search for your data first on data.census.gov described above. There is likely already a table that contains your variables or cross sections of interest. There are many benefits of using the pre-aggregated tables from the Census.

If you are seeking a demographic breakdown or data point that isn’t aggregated in a pre-existing table, then you can use the Census microdata to create your own table. By demographic breakdown, we mean a cross section in the data like “age by educational attainment” or “age by health insurance coverage”. Data.census.gov has many of these pre-calculated, but there are some specific ones that may not be available.

Microdata are un-tabulated records about individual people in housing units, and you can use them to create custom tabulations. You can think of this as the raw survey data, but the Census Bureau has provided the weights for you to use.

To download microdata, go to https://data.census.gov/mdat/#/ and select the dataset you’d like to download microdata from:

Screenshot of the homepage of the data.census.gov website for downloading microdata. The two input options are selecting a dataset (currently shown is "ACS 1-year estimates-Public Use Microdata Sample") and the vintage or year.

Then, select all the variables you’d like to be a part of your microdata dataset. You can search keywords in the “label” column. Select a variable to include using the checkbox to the left. Click details to see the values that are coded into that column and what they mean.

Screenshot of the data.census.gov website for downloading microdata. The first tab "Select Variables" is highlighted, showing the first step of data selection. On this tab, the filter menu is shown, with the "Label" filter highlighted to show you can search topics like education. A checkbox on the left hand side beside a variable row is highlighted to show how to select a variable for your microdata download.

Then, select geographies. To download data for San Francisco, select the 7 San Francisco PUMAs (as of the time this was written, there was no county filter, you have to sum up the PUMAs to get San Francisco estimates):

Screenshot of the data.census.gov website for downloading microdata. The first tab "Select Geographies" is highlighted, showing the second step of data selection. On this tab, select Public Use Microdata Area (PUMA) on the left to then select California and then all 7 PUMAs for San Francisco.

Then, the last required step is to download the data. Extract the data as a CSV. We only selected education, which is an individual person-level variable, so we’re only including person weights. If we had selected a household-level variable (like household income) the “Housing Unit Weight” would be pre-checked and included as well. Then you can download!

Remember to copy the bookmark and save it into your project’s data documentation. That will enable anyone else to come back and re-download the exact dataset for validation of your work.

Screenshot of the data.census.gov website for downloading microdata. The last tab "Download" is highlighted, showing how to select the "Extract raw data (.CSV)" option, with the *PUMS person weight included. Below the download button, the "Copy Bookmark" button is highlighted, which should be copied and stored in your project documentation.

Using R to download data

Using R for any of the above (downloading pre-calculated data tables or downloading microdata) is highly recommended if possible. Downloading data to CSV using the website is quick and easy initially but can become laborious after data transformations or changes to your methodology. Scripting your data download will also have the advantage of documenting what variables were used to get your dataset. This will make it easier for others to recreate your analysis. Having your data downloaded in a R script is the best way to ensure you can quickly and easily re-run this step.

Tidycensus

We recommend using the tidycensus package in R to download and analyze census and ACS data. You can import decennial census tables, ACS data tables, and microdata using tidycensus.

To get started with tidycensus, first walk through this guide on how to get started with tidycensus. Replicate the process from the guide exactly and see if you can download the same tables shown in the code excerpts.

Once you’ve been able to successfully implement the example from the start up instructions, you are ready to write your own code!

Downloading tables with tidycensus and get_acs()

When downloading ACS data tables, you’ll use the function get_acs() from the tidycensus package. First, familiarize yourself with all the function inputs. You will need to specify the geography, variable names or table name, the survey year, and you can name the county or state.

Notice that you need to list variable labels (like B19013_001) within the get_acs() function. In order to get those variable labels you need, you’ll need to search for variables using the load_variables() function. If you can’t find a topic easily, search for it on the Census Reporter Site. Here’s an example of searching for education, and they highlight several relevant variable labels.

Each variable label is a specific aggregation (like the population of a specific race). If you’d rather download an entire table (like B19013, instead of just the one row, B19013_001), use the table argument. Learn more about the table argument.

Downloading microdata with get_pums()

You can download microdata using the get_pums() function. When downloading microdata, you’ll want to download and then convert the data frame into a survey object to calculate margins of error.

Before getting started, replicate this example from the guidance to be able to use tidycensus and the srvyr package.

Resources and code examples for tidycensus

Last updated