How Differential Privacy Works

For years Apple has had a long commitment towards privacy not shared by many of its competitors. While Google and Microsoft are happy to suck up personalized data which hackers and the government could exploit, Apple has refused to do so. As an example, Apple announced at its Worldwide Developers Conference that all iOS apps must encrypt web communications by the end of the year.

But Apple needs data in order to personalize its services and know what adjustments their customers want, so on Tuesday Apple senior vice president of software engineering Craig Federighi discussed a concept called differential privacy which will be in iOS 10 software.

According to Apple differential privacy will “help discover the usage patterns of a large number of users without compromising individual privacy.” The idea is that while Apple can see user data in the aggregate to improve its services, it will be impossible for anyone to find data about any one user. This includes Apple itself, as well as hackers and governments.

The problems with privacy

iPhoneSecurity-cropped

How is it possible to get data in the aggregate but not at the individual level? In order to understand that we need to start with the challenges behind protecting user privacy.

Most companies do make some effort to protect your privacy, and they will often anonymize your data and refuse to publish personal information. But people can use what data is revealed to figure out your personal data.

It is comparable to finding out an Internet forum user’s real-life identity. You won’t have their real name or phone number, but you can note that the forum user lives in New York and went on a date at this restaurant. By using facts like these you can narrow it down until you can discover their true identity. And as Wired pointed out, researchers were able to do something like this in 2007 when Netflix published a list of “anonymous” customers.

This shows that even if a company tries to hide personal information, hackers can use the information they do have to glean personal data. And if the company tries to hide all the information they have, then they cannot use it on their end.

But what if all the information is obscured?

The idea behind Differential Privacy

change-bitlocker-encryption-method-check-current-encryption-method1

That is what differential privacy sets out to do. It works by algorithmically obscuring the data with noise so that hackers can never truly figure out what any one person said.

A lot of the ideas behind differential privacy are theoretical, worked out by tech scientists and cryptologists. But Cynthia Dwork, the co-inventor of differential privacy according to Engadget, gives an example of how it could work, using a surveyor who asks someone whether they have cheated on an exam:

Before responding, the person is asked to flip a coin. If it’s heads, the response should be honest but the outcome of the coin shouldn’t be shared. If the coin comes up tails, the person needs to flip a second coin; if that one is heads, the response should be “yes.” If the second is tails, it’s “no.”

Since a coin over the long run should come up head or tails about fifty percent of the time, the surveyor can roughly guess how many people actually did cheat on their exam over the aggregate. But if a malicious agency finds out that one particular individual answered “yes,” he has no idea if that is because the individual cheated on the test or because he said so after getting a tails and then heads on his coin flip.

Actual differential privacy algorithms are much more complicated but would be similar to the coin flip example. By creating mathematical “noise” to obscure individual data, it is impossible for anyone to know any one data point even if he knew the algorithm.

Potential concerns

ios-jailbreak

Differential privacy could mean that Apple and other companies could get data which helps them while protecting their customers’ privacy. But the fact is that much of the work surrounding differential privacy has been largely theoretical, and there have been no small-scale tests of how it might work.

Implementing it on a large scale, like Apple plans to do with iOS, without small-scale trials is risky.

However, differential privacy is not nearly as useful on a small scale. The mathematical noise will more heavily obscure the data in a small sample size, increasing the chances of entirely inaccurate data. Think about the above coin example. If the surveyor only surveyed 10 people, it is possible that eight people could have flipped “tails,” and his survey would be worthless. But if he surveyed 10,000, it is far less likely that 8,000 people flipped “tails”, and thus he can better trust his data.

Differential privacy is a hard-to-understand concept. But if Apple is successful, it could seriously change how companies acquire data. While there will be companies happy to take user data, the fact that there is a way to collect data without affecting individual privacy could have huge effects between company and customer.

Subscribe to our newsletter!

Our latest tutorials delivered straight to your inbox

Nathan Chandler Avatar

Read next

When Sony shipped the first Walkman in 1979, chairman Akio Morita insisted on a second headphone jack and a “hotline” talk button, convinced it would be rude for one person to listen to music alone — and within a few years buyers had ignored the sociable features so completely that Sony quietly dropped them
Russia still custom-builds the Soyuz return seats for ISS crew members using plaster casts taken weeks before launch, because astronauts grow as much as five centimetres taller during a long-duration stay and a seat moulded to their Earth-shaped spine would no longer fit the body that comes home
The “CrackBerry” nickname stuck for a reason — and the variable-reward psychology that hooked early-2000s executives on their BlackBerrys is the exact same machinery now running every push notification on every smartphone in your pocket
In 1843, Ada Lovelace described a brass-and-punched-card engine that could act on symbols as well as numbers, even composing music if harmony could be reduced to rules, inside seven translator’s notes three times longer than the paper itself
ARPANET sent its first message on 29 October 1969 from a lab at UCLA to a machine at Stanford, and the message was supposed to read ‘LOGIN’ — but the system crashed after the L and the O, meaning the first word ever transmitted over the network that became the internet was, by accident, ‘LO’.
In 1995, Microsoft shipped a cartoon-house interface called Bob, led by Melinda French, who married Bill Gates while it was in development — it demanded twice the memory of a typical home PC, sold roughly 30,000 copies, and was dead within a year, leaving behind the font Comic Sans and the animated assistant that became Clippy.
The Greenland shark grows about one centimetre a year, does not reach sexual maturity until around age 150, and a specimen carbon-dated by Danish researchers in 2016 was estimated to be at least 272 years old, meaning it was already swimming the North Atlantic when Mozart was composing symphonies.
When Apple shipped iOS 12 in June 2018, a small feature called Screen Time slipped onto every iPhone with a counter nobody had quite prepared for — a tally of pickups — and within a day Tim Cook was telling CNN the number of times he picked up his own phone was simply too many