Claude sometimes sends messages to itself and then thinks those messages came from the user. This is the worst bug I’ve seen from an LLM provider, but people always misunderstand what’s happening and blame LLMs, hallucinations, or lack of permission boundaries. Those are related issues, but this ‘who said what’ bug is categorically distinct.
I wrote about this in detail in The worst bug I’ve seen so far in Claude Code, where I showed two examples of Claude giving itself instructions and then believing those instructions came from me.

Claude told itself my typos were intentional and deployed anyway, then insisted I was the one who said it.
Here’s a Reddit thread where Claude said “Tear down the H100 too”, and then claimed that the user had given that instruction.

From r/Anthropic — Claude gives itself a destructive instruction and blames the user.
Comments on my previous post were things like “It should help you use more discipline in your DevOps.” And on the Reddit thread, many in the class of “don’t give it nearly this much access to a production environment, especially if there’s data you want to keep.”
This isn’t the point. Yes, of course AI has risks and can behave unpredictably, but after using it for months you get a ‘feel’ for what kind of mistakes it makes, when to watch it more closely, when to give it more permissions or a longer leash.
This class of bug seems to be in the harness, not in the model itself. It’s somehow labelling internal reasoning messages as coming from the user, which is why the model is so confident that “No, you said that.”
Before, I thought it was a temporary thing — I saw it a few times in a single day, and then not again for months. But either they have a regression or it was a coincidence and it just pops up every so often, and people only notice when it gives itself permission to do something bad.
This article reached #1 on Hacker News, and it seems that this is definitely a widespread issue. Here’s another super clear example shared by nathell (full transcript).

From nathell — Claude asks itself “Shall I commit this progress?” and treats it as user approval.
Several people questioned whether this is actually a harness bug like I assumed, as people have reported similar issues using other interfaces and models, including chatgpt.com. One pattern does seem to be that it happens in the so-called “Dumb Zone” once a conversation starts approaching the limits of the context window.