Will the real <user> please stand up? · ↗ simonwillison.net

Prompt injection as role confusion

Jun 25, 2026 · 1 min read

Ye, Cui & Hadfield-Menell wrote up their recent preprint on the mechanisms behind prompt injections as a highly readable blog post. Their argument is that prompt injection is best understood as a kind of role confusion. LLMs are supposed to distinguish between the different types of input using role tags (<user>, <system>, <think>, <tool>, and the like). But role tags can also be inserted maliciously, so LLMs seem to rely partly on the style of the text to infer what role it belongs to.

Unfortunately, this means that a webpage that sounds enough like an instruction can start to get treated like one. The conclusion of the paper is not very reassuring. If the model is always partly guessing who is speaking, then prompt injection may be less a problem we solve than a problem we learn to manage badly.

Hat tip to Simon Willison.