Cristian raised the following difference attack:
I was under the impression this attack was already described in another discussion. But to be more explicit:
We have 2 tables: purchases, with columns product_id and client_id, and clients, with columns id and ssn.
Column ssn is labeled as the AID. Assume the following data is present:
Clients:
| id |
ssn |
| 1 |
a |
| 2 |
b |
| 3 |
c |
| 4 |
d |
| 5 |
e |
| 6 |
f |
Purchases:
| product_id |
client_id |
| 1 |
1 |
| 2 |
2 |
| 3 |
3 |
| 4 |
4 |
| 5 |
5 |
| 1 |
6 |
| 2 |
6 |
| 3 |
6 |
| 4 |
6 |
| 5 |
6 |
If we issue a query like:
select count(*) from purchases join clients on client_id = clients.id
we get the following input to the count aggregator:
| count |
ssn |
| 1 |
a |
| 1 |
b |
| 1 |
c |
| 1 |
d |
| 1 |
e |
| 5 |
f |
If Ne=1 and Nt = 2, this results in a flattening of 4. The output will be 10 - 4 + noise(1) = 6 + noise(1).
For the query:
select count(*) from purchases left join clients on client_id = clients.id and ssn <> 'f'
we get the following input to the count aggregator:
| count |
ssn |
| 1 |
a |
| 1 |
b |
| 1 |
c |
| 1 |
d |
| 1 |
e |
| 5 |
NULL |
The last row is put aside as unaccounted.
If Ne=1 and Nt = 2, this results in a flattening of 0 for accounted rows. The output will be 5 - 0 + 5 - 0 + noise(1) = 10 + noise(1).
The difference in the results is large enough to dominate the noise, allowing us to know the entity with ssn = 'f' is an outlier and to detect other attributes of it by conditional inclusion/exclusion in queries.