LLMs can unmask pseudonymous users at scale with surprising accuracy

LLMs can unmask pseudonymous users at scale with surprising accuracy — Biz & IT - Ars Technica
Source: Biz & IT - Ars Technica

In a third experiment, the researchers took 5,000 users from the Netflix dataset and added another 5,000 “distraction” identities of people not in the results. They then added to the list of 10,000 candidate profiles 5,000 query distractors comprising users who appear only in a query set, with no true match in the candidate pool.

Compared to a classical baseline that mimics the Netflix Prize attack to LLM deanonymization, the latter far outperformed the former. The researchers wrote: (a) The precision of classical attacks drops very fast, explaining its low recall. In contrast, the precision of LLM-based attacks decays more gracefully as the attacker makes more guesses.

(b) The classical attack almost fails completely even at moderately low precision. In contrast, even the simplest LLM attack (Search) achieves non-trivial recall at low precision, and extending it with Reason and Calibrate steps doubles Recall @99% Precision.

llms, deanonymization, pseudonymous users, netflix dataset, classical baseline, query distractors, candidate profiles, precision, recall, calibrate step