
Two days ago, Anthropic released the Claude Cowork research preview (a general-purpose AI agent to help anyone with their day-to-day work). In this article, we demonstrate how attackers can exfiltrate user files from Cowork by exploiting an unremediated vulnerability in Claude’s coding environment, which now extends to Cowork. The vulnerability was first identified in Claude.ai chat before Cowork existed by Johann Rehberger, who disclosed the vulnerability — it was acknowledged but not remediated by Anthropic.
“I do not think it is fair to tell regular non-programmer users to watch out for 'suspicious actions that may indicate prompt injection’!”
As Anthropic has acknowledged this risk and put it on users to “avoid granting access to local files with sensitive information” (while simultaneously encouraging the use of Cowork to organize your Desktop), we have chosen to publicly disclose this demonstration of a threat users should be aware of. By raising awareness, we hope to enable users to better identify the types of ‘suspicious actions’ mentioned in Anthropic’s warning.
This attack leverages the allowlisting of the Anthropic API to achieve data egress from Claude's VM environment (which restricts most network access).
The victim connects Cowork to a local folder containing confidential real estate files

The victim uploads a file to Claude that contains a hidden prompt injection
For general use cases, this is quite common; a user finds a file online that they upload to Claude code. This attack is not dependent on the injection source - other injection sources include, but are not limited to: web data from Claude for Chrome, connected MCP servers, etc. In this case, the attack has the file being a Claude ‘Skill’ (although, as mentioned, it could also just be a regular document), as it is a generalizable file convention that users are likely to encounter, especially when using Claude.

The victim asks Cowork to analyze their files using the Real Estate ‘skill’ they uploaded

The injection manipulates Cowork to upload files to the attacker’s Anthropic account

If we expand the 'Running command' block, we can see the malicious request in detail:

Code executed by Claude is run in a VM - restricting outbound network requests to almost all domains - but the Anthropic API flies under the radar as trusted, allowing this attack to complete successfully.
The attacker’s account contains the victim's file, allowing them to chat with it


The exfiltrated file contains financial figures and PII, including partial SSNs.
The above exploit was demonstrated against Claude Haiku. Although Claude Opus 4.5 is known to be more resilient against injections, Opus 4.5 in Cowork was successfully manipulated via indirect prompt injection to leverage the same file upload vulnerability to exfiltrate data in a test that considered a 'user' uploading a malicious integration guide while developing a new AI tool:

As the focus of this article was more for everyday users (and not developers), we opted to demonstrate the above attack chain instead of this one.
An interesting finding: Claude's API struggles when a file does not match the type it claims to be. When operating on a malformed PDF (ends .pdf, but it is really a text file with a few sentences in it), after trying to read it once, Claude starts throwing an API error in every subsequent chat in the conversation.

We posit that it is likely possible to exploit this failure via indirect prompt injection to cause a limited denial of service attack (e.g., an injection can elicit Claude to create a malformed file, and then read it). Uploading the malformed file via the files API resulted in notifications with an error message, both in the Claude client and the Anthropic Console.
One of the key capabilities that Cowork was created for is the ability to interact with one's entire day-to-day work environment. This includes the browser and MCP servers, granting capabilities like sending texts, controlling one's Mac with AppleScripts, etc.