Finding and downloading data
Where to go to download ACS data.
Open Data Portal
The first place to look for ACS data for San Francisco is the Open Data Portal. DataSF calculates and publishes San Francisco ACS data tables along with San Francisco neighborhood data on the Portal. Checking the Open Data Portal first is important because the ACS tables on the portal are pre-formatted for analyst use and could save you a lot of time.
The datasets on the Open Data Portal include key statistical estimates like margins of error (MOEs) and coefficients of variation (CVs). These tables are also standard, easy to cite/link to, and consistent with other departments and City publications. Lastly, connecting to the Open Data Portal is easy and can save a lot of time.
For any table on the Portal, read the metadata and background information closely to ensure the data elements are what you are looking for. If you can’t find the table you are looking for, or if you want to add estimates that are not reported, and think it would be helpful to other analysts, please reach out to support@datasf.org to request the table be added.
Data tables from data.census.gov
Beyond the Open Data Portal, data.census.gov is your go-to site for data from the Census Bureau. If you can’t find a specific table on the Open Data Portal, you will search for it on data.census.gov. Familiarize yourself with the search interface and how to filter data and look at different tables. This portal contains data from the ACS, the decennial census, and more.
Below is a screenshot of the Census data website after searching for “education”. You can see the tables you could select from in the long list of tables. Currently selected is the “S1501” table from the 2019 1-year ACS estimates. You could choose a different ACS version using the down carrot button. You can also filter the data for San Francisco using the filter menu to the left.
Note that the Census Bureau frequently updates this data.census.gov interface. These screenshots are as of May 2022.
When you have the table filtered and on the current ACS version you’d like, click download to get a CSV file containing the data table.
Microdata
Always search for your data first on data.census.gov described above. There is likely already a table that contains your variables or cross sections of interest. There are many benefits of using the pre-aggregated tables from the Census.
If you are seeking a demographic breakdown or data point that isn’t aggregated in a pre-existing table, then you can use the Census microdata to create your own table. By demographic breakdown, we mean a cross section in the data like “age by educational attainment” or “age by health insurance coverage”. Data.census.gov has many of these pre-calculated, but there are some specific ones that may not be available.
Microdata are un-tabulated records about individual people in housing units, and you can use them to create custom tabulations. You can think of this as the raw survey data, but the Census Bureau has provided the weights for you to use.
To download microdata, go to https://data.census.gov/mdat/#/ and select the dataset you’d like to download microdata from:
Then, select all the variables you’d like to be a part of your microdata dataset. You can search keywords in the “label” column. Select a variable to include using the checkbox to the left. Click details to see the values that are coded into that column and what they mean.
Then, select geographies. To download data for San Francisco, select the 7 San Francisco PUMAs (as of the time this was written, there was no county filter, you have to sum up the PUMAs to get San Francisco estimates):
Then, the last required step is to download the data. Extract the data as a CSV. We only selected education, which is an individual person-level variable, so we’re only including person weights. If we had selected a household-level variable (like household income) the “Housing Unit Weight” would be pre-checked and included as well. Then you can download!
Remember to copy the bookmark and save it into your project’s data documentation. That will enable anyone else to come back and re-download the exact dataset for validation of your work.
Using R to download data
Using R for any of the above (downloading pre-calculated data tables or downloading microdata) is highly recommended if possible. Downloading data to CSV using the website is quick and easy initially but can become laborious after data transformations or changes to your methodology. Scripting your data download will also have the advantage of documenting what variables were used to get your dataset. This will make it easier for others to recreate your analysis. Having your data downloaded in a R script is the best way to ensure you can quickly and easily re-run this step.
Tidycensus
We recommend using the tidycensus package in R to download and analyze census and ACS data. You can import decennial census tables, ACS data tables, and microdata using tidycensus.
To get started with tidycensus, first walk through this guide on how to get started with tidycensus. Replicate the process from the guide exactly and see if you can download the same tables shown in the code excerpts.
Once you’ve been able to successfully implement the example from the start up instructions, you are ready to write your own code!
Downloading tables with tidycensus and get_acs()
When downloading ACS data tables, you’ll use the function get_acs() from the tidycensus package. First, familiarize yourself with all the function inputs. You will need to specify the geography, variable names or table name, the survey year, and you can name the county or state.
Notice that you need to list variable labels (like B19013_001) within the get_acs() function. In order to get those variable labels you need, you’ll need to search for variables using the load_variables() function. If you can’t find a topic easily, search for it on the Census Reporter Site. Here’s an example of searching for education, and they highlight several relevant variable labels.
Each variable label is a specific aggregation (like the population of a specific race). If you’d rather download an entire table (like B19013, instead of just the one row, B19013_001), use the table argument. Learn more about the table argument.
Downloading microdata with get_pums()
You can download microdata using the get_pums() function. When downloading microdata, you’ll want to download and then convert the data frame into a survey object to calculate margins of error.
Before getting started, replicate this example from the guidance to be able to use tidycensus and the srvyr package.
Resources and code examples for tidycensus
Reference this site for a quick introduction to tidycensus.
For more in-depth examples and descriptions, reference this e-book.
Last updated