Statistical reliability
Is this estimate reliable?
Why do we do this?
Before we get into the math, why do we care about statistical reliability? We need to ensure that the estimates based on ACS sample data actually reflect real population trends. If we share unreliable estimates, a department could dedicate resources or effort based on faulty data that doesn’t reflect reality. Unreliable data could misdirect leadership or operations from other key population trends that are reliable and have been verified/ground-truthed. The stakes can be high.
Margins of error (MOE)
ACS estimates are based on a sample of people who took the survey – not the whole population. That means that the estimates are just that – an estimate of what the underlying population characteristic might be. The Census Bureau publishes statistics, like the margin of error, with these estimates to help users understand how reliable the estimate is (how likely it is that this estimate matches the value of the whole underlying population).
A margin of error (MOE) describes the precision of an ACS estimate at a given confidence level based on statistical sampling theory. The MOEs for published ACS estimates are provided at a 90 percent confidence level. The confidence level associated with the MOE indicates the likelihood that the ACS sample estimate is within a certain range (the MOE) of the population value. In this case, 90% of the time we expect the population value to be inside the range provided (the range is the estimate ± MOE) if we sampled the population many times.
Statistically, the confidence interval measures the percent of time that an estimate will fall within the specified range if you sampled the population a large number of times. The confidence interval measures the level of variability around the estimate.
Read more about margins of error.
Read about how to calculate MOEs.
Coefficients of variation (CV)
Margins of Error can be converted to coefficients of variation (CV) values. A CV measures the relative amount of sampling error that is associated with a sample estimate. CVs can be shown as fractions, like 0.4, or as percentages 40%. We need to calculate CVs to make an assessment of reliability across different variables with vastly different units. Read about how to calculate CVs.
Once you’ve calculated the CVs, compare them against these guidelines:
good (CV <= 15%)
fair (15% > CV <=30%)
use with caution (CV > 30%)
The 30% or 0.3 cutoff serves as guidance and is a reasonable threshold. Several other governments, City departments, including DataSF and DPH use the 0.3 or 30% cutoff. The Census Bureau also recommends using this cutoff.
DataSF flags any CVs less than 30% as unreliable in the open dataset.
Confirming the number of survey respondents in microdata
If you are using microdata to calculate statistics, you will have to calculate the MOEs and CV.
In addition to the MOEs and the CVs, you will also need to validate the number of survey responses that are underlying each statistic. When you create a summary table using microdata, you’ll sum the individual (or household) weights to get the estimate. If you sum the number of rows in the data, this is the approximate number of survey respondents (the number of survey responses used to come up with that estimate). This may look something like this:
It’s important to look at the number of survey responses for each of your estimates as an additional reliability check. In general, it’s best (more reliable) if there are 30 responses underlying an estimate. This is because sampling error relies on an assumption of a normal distribution. It's not an exact science, but many statistics guides will generally say your samples need ~30 observations in order to make the assumption of normality.
This isn’t a hard a fast rule, but it’s good as a gut check to see if your estimates are based on 50 surveys, or 5. If an estimate is based off of fewer than 30 responses, this may be worth flagging, and/or considering rolling up to a larger geography or category.
So, is this estimate reliable or not?
This is a difficult question to answer, and there aren’t hard and fast rules. MOEs and CVs help you compare precision and reliability, but making the statement that something is “unreliable” or “reliable” is difficult. Here are a few pieces of guidance:
If you are using a pre-calculated ACS or Census table, the Census Bureau provides MOEs and already suppresses (doesn’t include) unreliable estimates.
That said, the criteria for data suppression the census uses are meant to filter out measures that are completely unreliable (CV value of >=61% or .61)
Calculate the CVs to better understand reliability using the thresholds above. You may also choose to suppress data points if the CV is high (above 30%), even if they were included in the initial table. This will depend on the data, the CV, and the risks involved with using an unreliable estimate.
Communicating statistical reliability
There are often challenges in communicating statistical reliability or sharing unreliable data. While many teams are hesitant to publish or use unreliable population data (as defined by the CV calculations below), audience groups or outside partners may feel that withholding unreliable data risks losing community trust. This is often a very difficult trade off – and understanding your audience/user’s intended use-case for the data can be important to finding a good compromise.
There are many different approaches, but one project team offered this approach:
For one analysis, we were looking at unemployment by race/ethnicity over time. The smallest race/ethnicity category counts fluctuated so wildly that the team did not trust their accuracy. For example, unemployment rate went from 30% one year to 100% the next, with the total population changing from 1,000 to 400 in a single year. To address this, we included:
A call-out box in the web page text above the dashboard explaining that the three race/ethnicity categories with too small sample sizes are not included
Data notes that offer specific examples of how large the confidence interval is (example: for American Indian and Alaska Native, the unemployment rate could be anywhere between 10% to 70%)
All rows of data in the open data portal dataset underlying it, but adding an additional column that tags them as “unreliable”
No matter what you do, you should always clearly communicate about the reliability of the data you share (being sure to flag any unreliable estimates).
Last updated