Basic data formats

The Census Bureau publishes ACS results in data tables and microdata.

Data tables

A data table is a summary table showing summary statistics for a given population characteristic. Raw survey data is summarized into meaningful statistics in a data table. Like this:

In a data table, you can easily compare results and read meaningful statistics, like “about 42.6% of San Francisco residents over 5 years old speak a language other than English at home”. You can get data tables from the Census Bureau for most geographies – cities, counties, census tracts, etc.

Microdata

Microdata is more “raw” survey data, where each row can be thought of as a survey response. Microdata is sometimes referred to as “PUMS” = “Public Use Microdata Sample”. Microdata looks like this:

In microdata, you will have to summarize the data to create a summary statistic to describe data and trends. You can think of each row as a person or a household. Depending on which metric you are studying, one row could represent a single survey respondent (like employment status) or one row could represent a single household that the survey respondent is answering for (like is the household renting or owning their home).

Each row in a microdata dataset is assigned a weight. You can interpret the weight as the number of households or persons represented by that specific row in the microdata. You will sum these weights to get population estimates for a specific geography. Learn more about summing weights in the analysis section.

Microdata datasets are based on a smaller survey sample size, so it’s better to use the pre-tabulated, aggregate data tables when possible. Summing up person weight microdata will not result in the exact same numbers as the aggregate data (but should be very close and within the MOE). Read more about microdata.

“Public Use Microdata Areas” (PUMAs)

Census microdata is only available at “Public Use Microdata Areas” (PUMAs) which are smaller than a county but larger than Census Tracts. Currently there are 7 PUMAs in San Francisco County. Given the small sample size associated with PUMAs, microdata should only be used for countywide estimates if they are not available from pre-existing Census Bureau tables. This means reporting for San Francisco as a whole, not a specific PUMA.

Public Use Microdata Areas in San Francisco County (2020 PUMAs from U.S. Census Bureau visualized in ESRI’s Living Atlas)

What is IPUMS data? Is it the same as microdata?

The microdata that we are referring to throughout this guide is the microdata published directly from the US Census Bureau.

IPUMS data is slightly different. IPUMS is an organization that is a part of the Institute for Social Research and Data Innovation at the University of Minnesota. IPUMS houses the largest individual-level population database, which combines many data sources (including the ACS) both for the US and other countries. IPUMS USA data is based on ACS and decennial census data and is harmonized and re-formatted by the experts at IPUMS. IPUMS data estimates and tables will not always match microdata exactly.

While IPUMS data is frequently used by researchers, we recommend using the microdata directly from the US Census Bureau for almost all use-cases. In this guide when we mention microdata, we are referring to the microdata from the Census Bureau, not IPUMS datasets.

Do I use data tables or microdata? It’s best to use data tables

Data tables are preferable for almost all analyst use-cases. There are many benefits of using the pre-aggregated tables from the Census. Using the pre-calculated data tables saves time, is validated by the statisticians at the Census Bureau, and includes margins of error pre-calculated in the data table. Using microdata to calculate statistics and then generating margins of error for these statistics is more complicated and takes more time.

We recommend using the data tables as much as possible. There is likely already a table that contains your variables or cross sections of interest. Only use microdata if necessary.

📌Example: While reporting data during the COVID-19 emergency response, the team almost always used pre-aggregated population tables. The only exception came when vaccines were first released by age group. Because the vaccine eligibility age groups did not match the ACS age groups in any pre-aggregated tables, the COVID taskforce needed to use microdata when reporting vaccination rates by age group citywide.

PreviousThe American Community Survey (ACS)NextLimitations of ACS data

Last updated 1 year ago

hashtagData tables

hashtagMicrodata

hashtag“Public Use Microdata Areas”arrow-up-right (PUMAs)

hashtagWhat is IPUMS data? Is it the same as microdata?

hashtagDo I use data tables or microdata? It’s best to use data tables

Data tables

Microdata

“Public Use Microdata Areas” (PUMAs)

What is IPUMS data? Is it the same as microdata?

Do I use data tables or microdata? It’s best to use data tables