Economist Florian Ederer linked a new preprint describing the creation of an automated LLM-based pipeline for linking anonymous users across datasets based on unstructured text written by or about them. Prof Ederer is himself famous for unmasking the IP addresses of users of the infamous (but influential) Economics Job Market Rumors message board, exploiting a flaw in how usernames were assigned to anonymous posters. For platforms not encoding a user’s IP address in their “anonymous” username, the LLM-based approach involves:
- Extracting structured features from free text
- Encoding extracted features to embeddings to compare to candidate profiles
- Reasoning using all available context to identify the most likely match among top candidates
- Calibrate the quality of match by asking the LLM to report confidence
I guess it’s only a matter of time before someone uses this strategy to unmask Reviewer 2. (Currently this is only possible if Reviewer 2 insists you cite all of the work of the brilliant Dr. X.)