This repo contains a version of Anthropic's original performance take-home, before Claude Opus 4.5 started doing better than humans given only 2 hours.
The original take-home was a 4-hour one that starts close to the contents of this repo, after Claude Opus 4 beat most humans at that, it was updated to a 2-hour one which started with code which achieved 18532 cycles (7.97x faster than this repo starts you). This repo is based on the newer take-home which has a few more instructions and comes with better debugging tools, but has the starter code reverted to the slowest baseline. After Claude Opus 4.5 we started using a different base for our time-limited take-homes.
Now you can try to beat Claude Opus 4.5 given unlimited time!
Measured in clock cycles from the simulated machine. All of these numbers are for models doing the 2 hour version which started at 18532 cycles:
- 2164 cycles: Claude Opus 4 after many hours in the test-time compute harness
- 1790 cycles: Claude Opus 4.5 in a casual Claude Code session, approximately matching the best human performance in 2 hours
- 1579 cycles: Claude Opus 4.5 after 2 hours in our test-time compute harness
- 1548 cycles: Claude Sonnet 4.5 after many more than 2 hours of test-time compute
- 1487 cycles: Claude Opus 4.5 after 11.5 hours in the harness
- 1363 cycles: Claude Opus 4.5 in an improved test time compute harness
- ??? cycles: Best human performance ever is substantially better than the above, but we won't say how much.
While it's no longer a good time-limited test, you can still use this test to get us excited about hiring you! If you optimize below 1487 cycles, beating Claude Opus 4.5's best performance at launch, email us at performance-recruiting@anthropic.com with your code (and ideally a resume) so we can be appropriately impressed, especially if you get near the best solution we've seen. New model releases may change what threshold impresses us though, and no guarantees that we keep this readme updated with the latest on that.
Run python tests/submission_tests.py to see which thresholds you pass.