Estimates of New York’s population have recently been on a roller coaster. The Census Bureau’s Annual Population Estimate for 2020 for New York was 19,382,373, increasing only 4,271 since 2010. In 2021, the Census Bureau released its 2020 Decennial Census of the nation’s population. The decennial Census showed that New York’s population was 20,201,249, increasing by 823,147. Following the publication of the Census data, I wrote a post, “A Caution about the Use of Census Population Estimates,” that pointed out the difference between the population estimates and the Census data and questioned the accuracy of the estimates.
On May 19th of this year, the Census Bureau released a study of the accuracy of the 2020 Census, “Census Coverage Estimates for People in the United States by State and Census Operations.” The analysis contained revised population data from a survey that corrected erroneous enumerations, “whole person imputations and omissions.” The new study produced a 2020 population estimate (19,506,326) for New York State, much closer to the Annual Population Estimate than the Census data. Based on the new study, the State’s population grew by only 128,224 – less than one percent. Why do the numbers differ?
The 2020 Census
When the 2020 Census was conducted, the COVID pandemic adversely affected the ability of interviewers to complete their tasks. Because the pandemic hit in March and April, data collection was paused when the nation was locked down. Because of the pandemic, the New York Times reported that people were less likely to allow interviewers to speak to themselves in person. In addition, skepticism about the process could have resulted because the Trump administration attempted to change the Census to prevent the inclusion of undocumented aliens in the data and wanted to limit the data collection period. As a result, data quality suffered somewhat compared to the 2010 Census.
The 2020 Census Post Enumeration Survey
After each Decennial Census, the Bureau conducts surveys to measure its accuracy. For those Census respondents who were interviewed, in some cases, misreporting occurred. In others, recall errors were present. Because the Census cannot interview someone from every household in the country, a portion of the count used data from other sources, like neighbors or other government records. In addition, the data contained some duplicate counts and incorrect locations for persons. Some housing units were wrongly coded as occupied rather than vacant, and others were missed.
To assess the accuracy of the Census, the Bureau created two independent samples of about 150,000 people to identify data errors. The smaller scope of the post enumeration surveys permitted a more intensive data evaluation. Two independent samples were used to ensure sample representativeness. But, the 2000 Post-Enumeration Survey data had to be reevaluated because errors were discovered that resulted in a population overestimation. Since then, the Bureau has implemented processes to prevent a repeat of the same mistake.
Because the 2020 Post Enumeration Survey showed that New York’s population was significantly smaller than the uncorrected Census estimate, the state’s population growth from 2010 to 2020 was much smaller using the corrected data. The smaller population growth dropped New York’s rank from 7th to 34th. In percentage terms, New York’s 0.7% growth ranked 47th. Only Maine, with 0.3%, Hawaii, with a loss of -0.3%, Rhode Island, with a loss of -1.0%, and West Virginia, with a loss of -4.7%, had slower growth.
The Post Enumeration Survey produced a set of population estimates that were, in total similar to the census values. The Survey estimates were lower than Census estimates in the Northeast, the Mid-West, and the coastal Western States. PES estimates were higher than census numbers in the South, the Plains states, and the interior West. The largest Census population overestimates were Hawaii (6.8%), Delaware (6%), Rhode Island (5.1%), the District of Columbia (4.6%), Nevada (4.4), Minnesota (3.8%), and New York (3.4%). States with the largest Census underestimates were Arkansas (-5%), Tennessee (-4.8%), Montana (-4.4%), Mississippi (-4.1%), and Louisiana (3.7%)
Sampling Error in the Post Enumeration Survey
Although the post enumeration improved the quality of population data, it is sample-based, and its population estimates may vary from the actual values. Because of natural sampling variability, the estimates could differ from the actual population by an amount defined by a confidence interval- the range around a sample estimate within which the true value is likely to fall. Typically, researchers use a 95% confidence level. (The Census Bureau uses a slightly more lenient standard – 90%.). Although the Bureau presents a single value for State population estimates, the actual value could fall in a range anywhere within the confidence interval.
The classic example given for this concept involves a coin flip. We know that a coin flip will generate an equal number of heads and tails over time. But, as gamblers know, if we flip a coin four times, we don’t always get two heads and two tails. Sometimes three heads come up; other times, we might get lucky and get four heads, or we could get none. But the more times we flip the coin, the closer to the actual 50-50 split we are likely to get. The sample size largely determines the size of the range around the sample value where the true value falls.
Although the Post Enumeration Survey is large – 150,000 respondents nationally, the number of participants in each state is much smaller. Consequently, the confidence intervals for the 2020 state populations are larger than for the nation. For larger states, the range within which the actual 2020 population fell was generally within two to five percent of the published estimate. However, for smaller states, the actual 2020 population fell within a much wider range. In West Virginia, the range within the confidence interval was 16.8% – equal to one in six of the state’s residents. Six states had confidence intervals that were as much as 10% of the state’s reported population. In thirty states, the range of actual population values was greater than 5% of the published population.
In New York State, the Survey found that the population in 2020 was between 19,178,143 and 19,875,863 – a difference of 697,000, using a 95% confidence interval. With the state’s actual 2020 population falling within a wide confidence interval, the state might have lost as many as 220,000 residents between 2010 and 2020 or gained as many as 477,000 – a range of -1% to +2.6%. Neighboring states also had broad confidence intervals – in most cases, the ranges were hundreds of thousands of residents. In some other states, the range was much larger. Montana may have grown between 6.2% and 22.3%. West Virginia’s population might have declined by as much as -12.6% or gained as much as 3.3%.
Researchers and politicians often focus on ranking state population change, but the limitations in available census data make the exercise a fruitless one. The range of possible state populations within the confidence intervals is too large to make a precise analysis possible. With New York’s possible population change of -1% to 2.6%, the State’s rank could have been between 38th and 49th, assuming other state estimates were accurate. West Virginia’s rank could have been between 36th and 50th. Montana’s could have been between first and 24th.
Although the confidence intervals for state populations in the Post Enumeration Survey are relatively broad, the Survey shows that there is less than one chance in ten that the Census population values for New York and several other states were accurate. Using the 90% confidence standard, seven states had fewer residents than recorded in the 2020 census — Delaware, Hawaii, Massachusetts, Minnesota, New York, Ohio, Rhode Island, and Utah. In six states, an undercount was likely – Arkansas, Florida, Illinois, Mississippi, Tennessee, and Texas.
The 2020 Census Post Enumeration Survey demonstrated that the likelihood that New York’s population increased as much from 2010 as the Decennial Census reported was less than 10%. But, because the Survey included a relatively small sample of New Yorkers, it is impossible to know the state’s precise population change. The range around New York’s 2020 population estimate for 95% certainty was 3.6% of the published value – a confidence interval of 697,000. The difference between the high end of the confidence interval and the low end resulted in 2010-2020 population change estimates ranging from a loss of 220,000 residents to a gain of 477,000. Smaller states had larger potential 2020 population estimate ranges and 2010-2020 population change confidence intervals.
In my earlier post, I argued that the Census Bureau should show standard errors and confidence intervals for its Annual Population Estimates. In the case of the Post Census Enumeration Study, the Bureau made that information available, and it shows that possible sampling errors in the data are relatively large, particularly for less populous states. Given the relatively large confidence intervals in the Post Enumeration Survey and the absence of information about sampling errors in the Bureau’s Annual Population Estimates, we cannot know precisely actual state populations, how much the population changed from year to year, or the ranking of states.