Limitations of ACS data
Summary of data limitations that analysts should be aware of while using ACS data.
We often use ACS data in local government, and there are many reasons why it works for many of our use cases. For one, ACS data tables contain some of the most detailed demographic data that exists in the United States, so it’s our best option for some data elements. In addition, ACS data is publicly available, which means anyone can download the raw data. This ensures transparency and enables journalists, residents, and others to be able to access the data and confirm or critique methods. Census Bureau data also have well-documented methodologies and are implemented by federal or state departments with an aim to consistently apply methods in all jurisdictions. This allows for the possibility of comparing our data to other cities.
That said, ACS data also has limitations. Whether your analysis is going to be published publicly or used by your team or department operationally, it’s important to understand and clearly communicate the limitations to the users or audience of your analyses.
The ACS provides estimates, not counts
If you are using ACS data, it’s important to communicate that the data are population estimates they are not precise counts. ACS data can be used to show high-level population trends. It isn’t intended to provide granular resident counts.
This is why it’s important to measure and evaluate data reliability. Analysts should also share data reliability information (MOEs, CVs) alongside the analysis results as much as possible. This means clearly flagging any unreliable or less reliable estimates, and in some cases, not showing them at all. Analysts should also describe the methodology used to users in clear, concise data notes.
While Margins of Error and Coefficients of Variation can be statistically complex, lean on the Census Bureau documentation (much of it referenced within this document) for clear explanations. Provide links to this documentation from your analyses. Communicating about the methods used (including exactly what ACS tables or variables you used) and sharing reliability data is critical to be transparent about your analyses build trust with your users.
ACS is self-reported
The American Community Survey, like all surveys, relies on self-reported information rather than observation. For example, this means that instead of tracking income by examining pay stubs or tax returns, the ACS simply asks a respondent to estimate their income and report it.
This is especially important to consider when using surveys to investigate topics where people may not be inclined to self-report information correctly. For example, survey respondents may not answer accurately because of sensitivity or perceived judgment (i.e. income, poverty, use of benefits, etc.) or memory recall errors (i.e. dollars spent on specific expenses or types of benefits or services receiving, etc.). In both of these cases, it may be helpful to investigate if other data sources can provide direct data on these areas (i.e. administrative data) or use other surveys/ data sources to validate findings from ACS.
Undercounted groups in Census data
In addition to data reliability varying on different tables or variables, estimates may be less reliable for certain groups. The census has been found to systematically undercount certain minority groups, like children, noncitizens, renters, and unhoused individuals. By race or ethnicity, Black/African Americans, Hispanic/Latino, and Pacific Islanders were more likely to be undercounted in the 2020 census.
This undercounting in the Census likely impacts the ACS, which uses Census data as inputs in some sampling/estimations. In San Francisco use-cases, we have seen certain population estimates that haven’t aligned with community knowledge after ground-truthing the data.
The Census Bureau does use techniques to deal with this and they do validate with more than just the decennial census data. While this doesn't mean that ACS data is not biased, the Census Bureau has been trying to account for non-response bias. Read more about how they did this during the COVID pandemic.
Because of non-response bias and other potential data quality issues, the 2020 ACS 1-year data were considered “experimental”. Analysts should consider this when using these estimates.
San Francisco-specific trends, groups, or use cases
Some of the demographic information from the Census is not calculated or presented in ways that serve San Francisco-specific use cases. Some data best practices can be hyper-local to San Francisco’s communities. Census data released at the state or national level may not be responsive to these experiences nor speak to important nuances.
📌Examples: The census asks about race and ethnicity as two separate questions. Many City departments ask for this information in one question (sometimes a textbox), making comparisons difficult. Many City departments ask for gender identity with multiple possible responses and/or a textbox. The Census just captures sex with only two categories listed. This makes estimating the underlying population for different gender identities difficult. Project teams will need to weigh comparative reporting needs to make decisions about how to report on race, ethnicity, and gender identity. These decisions may be informed by reporting requirements, client department objectives, and target audience priorities.
Last updated