A statistical method used by the U.S. Census Bureau for the first time in 2020 to protect confidentiality has made people and occupied homes vanish, at least on paper, when they actually exist in the real world
The three-bedroom colonial-style house where Jessica Stephenson has lived in Milwaukee for the last six years bustles with activity on any given weekday, filled with the chattering of children in the day care center she runs out of her home.
The U.S. Census Bureau says no one lives there.
“They should come and see it for themselves,” Stephenson said.
From her majority-Black neighborhood in Wisconsin to a community of Hasidic Jews in New York’s Catskill Mountains to a park outside Tampa Florida, a method used by the Census Bureau for the first time to protect confidentiality in the 2020 census has made people and occupied homes vanish — at least on paper — when they actually exist in the real world.
It’s not a magic trick but rather a new statistical method the bureau is using called differential privacy, which involves the intentional addition of errors to data to obscure the identity of any given participant.
Bureau officials say it’s necessary to protect privacy in a time of increasingly sophisticated data mining, as technological innovations magnify the threat of people being “re-identified” through the use of powerful computers to match census information with other public databases. By law, census answers are supposed to be confidential.
But some city officials and demographers think it veers too far from reality — and could cause errors in the data used for drawing political districts and distributing federal funds.
At least one analysis suggests that differential privacy could penalize minority communities by undercounting areas that are racially and ethnically mixed. Harvard University researchers found that the method made it more difficult to create political districts of equal population and could result in fewer majority-minority districts.
The Census Bureau, for its part, argues that the data is every bit as good as in past censuses and that the low-level inaccuracies don’t present a large-scale problem.
What’s certain is that the method can produce weird, contradictory and false results at the smallest geographic levels, such as neighborhood blocks.
For example, the official 2020 census results say 54 people live in Stephenson’s census block in midtown Milwaukee, but also that there are no occupied homes. In reality almost two dozen houses occupy the car-lined streets, some dating back more than a century. Forty-eight of the residents living in the block are Black, according to the census, though it’s difficult to know for sure, given the whimsy of differential privacy.
In another case, the census lists no people living in the Flatwoods Conservation Park outside Tampa, even though it says there is a home occupied by people. According to Hillsborough County spokesman Todd Pratt, two county employees live there while maintaining security for the park.
And in an enclave of Hasidic Jews located in Kiamesha Lake, New York, 81 people are recorded as residents, but the census officially says there are no occupied homes. Sullivan County property records show almost a dozen homes whose residents have ties to the Vizhnitzer Hasidic community.
The unreliable data has created headaches for city managers and planners of small communities who worry that it may not be valid for decision-making. Eric Guthrie, a senior demographer at the Minnesota State Demographic Center, said he has been contacted by a half-dozen city managers from around the state who were concerned about potential impacts to state and federal funding.
“I explain to them there’s not a method for correcting it, that it’s not an error in the traditional sense,” Guthrie said. “The bug is there by design.”
The scale of the changes become clearer when viewed through a broader lens. For Florida, the nation’s third most populous state with more than 21 million residents, the 2020 census listed 15,000 neighborhood blocks as having a total of 200,000 residents but no occupied homes. On the flip side, 1,200 of the state’s 484,000 blocks were listed as having occupied homes but no population, according to Rich Doty, geographic information system coordinator and research demographer at the University of Florida’s Bureau of Economic and Business Research.
“We expected these anomalies, as we were warned about this by the Census Bureau and other states,” Doty said. “We just didn’t expect this many.”
Ahead of the release of census data used for drawing congressional and legislative districts in August, acting Census Bureau director Ron Jarmin warned that its application could produce some “fuzzy” figures at the neighborhood block level and urged data users to combine blocks to get accurate results. But the bureau also says that despite the implementation of differential privacy, the quality of the 2020 data isn’t any worse than previous censuses based on measurements of data quality.
That claim is hard to evaluate since the raw data without the application of differential privacy is not being made public, said Stefan Rayer, a University of Florida demographer.
“We have to take their word for it,” Rayer said.
Using test data, the Harvard researchers found that differential privacy was more likely to undercount mixed-race and mixed-partisan precincts, “yielding unpredictable racial and partisan biases,” because it prioritizes the accuracy of the population count for the largest racial group in a given area.
“Our findings underscore the difficulty of balancing accuracy and respondent privacy in the Census,” they said in a report.
The Census Bureau disagrees, and so far the courts have found no reason to stop it.
Differential privacy was unsuccessfully challenged by the state of Alabama earlier this year. In a declaration for that lawsuit, the Census Bureau’s chief scientist, John Abowd, called the data “extremely accurate” and said the use of differential privacy showed no bias regarding racial or ethnic minorities.
“Redistricters can remain confident in the accuracy of the population counts and demographic characteristics of the voting districts they draw, despite the noise in the individual building blocks,” Abowd said.
Not everyone believes the technique is the right way to protect confidentiality.
Two University of Minnesota researchers wrote in a recent paper that a Census Bureau experiment failed to show genuine threats to confidentiality and that any risks of re-identification were similar to random guessing of households’ characteristics.
One of them, demographer Steven Ruggles, said during a presentation this month that the Census Bureau’s fear of re-identification and the resulting justification for using differential privacy could undermine confidence in the census data.
“It should not justify the degradation of the statistical infrastructure of our country,” Ruggles said. “The whole thing is likely to backfire.”
Follow Mike Schneider on Twitter at http://twitter.com/MikeSchneiderAP