Researchers question Census Bureau’s new approach to privacy

PROVIDENCE, R.I. (AP) — In an age of rapidly advancing computer power, the U.S. Census Bureau recently
undertook an experiment to see if census answers could threaten the privacy of the people who fill out
the questionnaires.
The agency went back to the last national headcount, in 2010, and reconstructed individual profiles from
thousands of publicly available tables. It then matched those records against other public population
data. The result: Officials were able to infer the identities of 52 million Americans.
Confronted with that discovery, the bureau announced that it would add statistical "noise" to
the 2020 data, essentially tinkering with its own numbers to preserve privacy. But that idea creates its
own problems, and social scientists, redistricting experts and others worry that it will make next
year’s census less accurate. They say the bureau’s response is overkill.
"This is a brand new, radically more conservative definition of privacy," University of
Minnesota demographer Steven Ruggles said.
Federal law bars census officials from disclosing any individual’s responses. But data-crunching
computers can tease out likely identities from the broader census results when combined with other
personal information.
Some critics fear the agency’s changes could make it harder to draw new congressional and legislative
districts accurately. Others worry that research on immigration, demographics, the opioid epidemic and
declining life expectancy will be hindered, particularly when it involves less populated areas.
If the change had been in place four years ago, Ruggles said, he would not have been able to conduct a
2015 study on the impact of declines in young men’s incomes on marriage.
With more and more data sets available to the public with a quick download, it has become easier than
ever to match information with real names. That means aggregated answers to census questions involving
race, housing and relationships could lead to individuals.
The fear is that advertisers, market researchers or anybody with know-how and curiosity could use data to
reconstruct the identities of census respondents.
When the bureau went back to the 2010 census, it matched the census data with commercial databases. More
than 1 in 6 respondents were identified by name and neighborhood as well as by information about their
race, ethnicity, sex and age.
Since the last census, "the data world has changed dramatically," Ron Jarmin, deputy director
of the census agency wrote earlier this year. "Much more personal information is available online
and from commercial providers, and the technology to manipulate that data is more powerful than
ever."
The Trump administration’s unsuccessful effort to add a citizenship question to the 2020 questionnaire
heightened fears about how census information would be used. But privacy concerns are nothing new for
the bureau.
Historians have found evidence that census data helped identify Japanese Americans who were rounded up
and confined to camps during World War II. That revelation led to an apology from then-Census Bureau
Director Kenneth Prewitt in 2000.
Jewish groups and some liberal organizations had concerns about privacy when the bureau was lobbied to
ask about religion for the 1960 census. Some noted that Nazis had used government and church records to
identify and round up Jews. The idea never went anywhere.
During the legal battle over the citizenship question, advocates worried that the information could be
used to target residents in the country illegally. Some say lingering concerns could have a chilling
effect on the 2020 census.
To address those worries, the bureau has adopted a technique called "differential privacy,"
which alters the numbers but does not change core findings to protect the identities of individual
respondents.
It’s analogous to pixilating the data, a technique commonly used to blur certain images on television,
said Michael Hawes, senior adviser for data access and privacy at the Census Bureau.
Redistricting experts say the mathematical blurring could cause problems because they rely on precise
numbers to draw congressional and state and local legislative districts. They also worry that it could
dilute minority voting power and violate the Voting Rights Act.
"The numbers might be off by five, 10, 20 people, and if you’re dealing with exact percentages, that
could mean something. That could mean a lot," said Jeffrey M. Wice, a national redistricting
attorney. "That’s why we care about it so much."
In the past, the bureau has used "swapping" and other methods to protect confidentiality.
Swapping involves taking similar households in different geographic areas and exchanging demographic
characteristics.
Census data does not need to be exact for most purposes, "as long as we know it’s really pretty
close," said Justin Levitt, an election law professor at Loyola Law School in Los Angeles. But
"there’s certainly a point where blurry becomes too blurry."
The bureau has not decided precisely how much blurring will take place, but researchers have already
delivered academic papers and organized a petition signed by more than 4,000 scholars, planners and
journalists. The petition asked the bureau to include the research community in its discussions.
Michael McDonald, a University of Florida redistricting expert, said people must be assured their data
will be kept confidential or they may not respond at all. If respondents do not answer questions for the
once-a-decade census in a timely manner, census workers must try to interview them in person.
"We need high response rates to the census," McDonald said. "If we don’t get them,
whatever noise will be moot because we won’t have good data to start with."
___
Follow Jennifer McDermott on Twitter at https://twitter.com/JenMcDermottAP . Follow Mike Schneider on
Twitter at https://twitter.com/MikeSchneiderAP .