Harvard Researchers Identify Accuracy Issues in New Census Bureau Privacy System | New
Harvard government researchers and statistics found in a study published last month that a new method used by the United States Census Bureau to increase confidentiality could potentially skew the data used for redistribution.
The Census Bureau introduced a new 2020 Census Data Disclosure Avoidance System, which was designed to increase privacy protection by adding “noise” to census microdata.
Harvard researchers used computer simulations entered with proposed DAS parameters – which were released in late April – to generate many potential redistribution maps using available 2010 census data. Prior to the 2020 census, the Census Bureau has exchanged data from some households with others to protect privacy.
Professor of government and statistics Kosuke Imai, corresponding author of the study, said the DAS uses a “very complicated post-processing method” to make it easier to use the data for redistribution.
“But the problem is, the added noise is no longer symmetrical, so it adds some bias, but it’s hard to know exactly how those biases are created,” Imai said.
By studying the effects of DAS on constituencies and democratic elections, the study found that DAS would make it “impossible” for map designers to create precise districts of equal populations at the bloc level in accordance with the One Person principle. a vote, which guarantees each person’s vote is also represented in all districts.
“Under the privacy protections of the old censuses, the block-level populations were accurate – the exact meaning of everything the Census Bureau counted and estimated was the most likely number that was published,” co -author and government PhD. said student Christopher T. Kenny.
“Now we’re under this new system, which will have different populations at the block level than what the census actually believes is the total number of people in that block,” Kenny said. “That kind of gives a new twist to 54, 55 years of Supreme Court precedent here.”
According to the study, using the DAS parameters proposed from April, any deviation from truly equal districts will be underreported by several times.
Additionally, the researchers also found that in the then proposed DAS model, racially and / or politically heterogeneous areas are underestimated, leading to a potential overestimation of the degree of racial and political segregation across the country.
“The DAS tends to introduce more errors for minority groups than for white voters, and even more errors for voters who belong to a minority group for their census block, which is also more common for voters. minority voters, ”the study says.
According to the researchers, the under-representation of racially and politically heterogeneous areas would make it more difficult to identify partisan gerrymandering, the appropriate allocation of federal funds, and the conduct of meaningful academic research.
The researchers showed that the DAS system also does not prevent algorithms from inferring the race of voters from names and addresses. Rather, the researchers were “able to predict the individual race of registered voters at least as accurately using DAS-protected data as when using original census data.”
“So when you start to have a system that sometimes doubles or halves the population of small towns and villages – all in the name of preventing people from knowing a respondent’s race – I think it’s very valid to ask, ‘Okay, is that the right cost-benefit trade-off?’ Said the co-author and PhD in statistics. candidate Cory W. McCartan.
Last Wednesday, the Census Bureau announced finalized parameters that will be used for the DAS system in August to assist with redistribution based on data from the 2020 census. In a press release, the Bureau thanked the research groups for having provided valuable feedback during the development of the DAS algorithm.
“The decisions strike the best balance between the need to publish detailed and usable 2020 census statistics and our legal responsibility to protect the privacy of individuals’ data,” said Ron Jarmin, director of the US Census Bureau, in the statement. hurry. “They were made after many years of research and sincere feedback from data users and external experts – whom we thank for their invaluable contribution. “
The press release noted that the DAS development team responded to concerns about prejudices against racially or ethnically homogeneous areas, and these changes were incorporated into the new settings.
Kenny said he was disappointed the Census Bureau did not use the study’s recommendation to keep block populations “at their best.”
“In our report, we recommend that they should – if they’re going to use the algorithm they’re currently trying to use – that they should try to keep the block populations invariant,” Kenny said. “They will not improve the accuracy of the block populations, which for me is a very disappointing result.”
The Census Bureau wrote in the press release that it was unable to implement all of the comments on the metrics it put forward in April.
“For example, some data users have recommended near-perfect precision in block-level data, which we cannot achieve because it would compromise the ability to implement a functional disclosure avoidance system,” said writes the Census Bureau. “We are both legally and ethically obligated to protect the confidentiality of data provided by and on behalf of our respondents. “
Imai praised the Census Bureau’s transparency, even though it did not release the data parameters until late in the process.
“I think the census did the right thing is to publish these demo datasets and have people like us analyze them,” Imai said. “In a way, I wish they had done it sooner – because we [were] as given as a month, so it was pretty hectic for us to put things together. Away from the process, I think it was a good, very transparent way to make an important public policy decision. “
– Editor-in-Chief Kate N. Guerin can be contacted at [email protected]