May 26, 2026 • Max Trivedi
Tl:Dr: This essay is an attempt to answer at which point it becomes more economical to hire an engineer in a cheaper country and give them DeepSeek/local-AI API key vs using Frontier closed-source LLMs and concludes that at the very least, this dynamic puts a price ceiling on the frontier lab offerings. We use DeekSeek as a proxy for localAI costs.
We keep hearing that the inference costs are supposed to be on a downward trajectory but they are evidently not, not for the frontier US labs anyways.
GPT 5.5 ($5/$30) that released less than 2 months after GPT-5.4 doubled the API pricing across the board. GPT 5.5 costs over 3x of what GPT-5 cost 8 months ago ($1.25/$10).
Gemini 3.5 Flash ($1.50/$9.00) tripled the API pricing across the board over its predecessor Gemini-3-flash-preview ($0.50/$3.00) which was already price-hiked from its predecessor 2.5 Flash (0.30/$2.50)
Anthropic released Opus-4.7 with a new tokenizer that effectively increased the token consumption by 32% to 47% over its immediate predecessor Opus-4.6.
How do the frontier OSS and closed source models compare
For this comparison, we used a ‘blend token consumption ratio’ that assumes that for every 1M input (plus cached) tokens, there are 50k output tokens (just under 5%). This is a conservative estimate if anything since large agentic loops are dominated by reads due to the large number of turns.
Then we take the caching into account for each provider (source: openrouter.ai) and compare the average blend price per million agentic tokens.
| Provider | Input Price ($/1M) | Output Price ($/1M) | Cache Hit Rate |
|---|---|---|---|
| Anthropic | $1.57 | $25.00 | 79.6% |
| OpenAI | $1.30 | $30.22 | 84.8% |
| DeepSeek | $0.055 | $0.870 | 88.1% |
Anthropic: $1.57 + $1.25 = $2.82
The current closed source frontier models are more capable than the latest from DeepSeek. But is the capability difference enough to justify a 30x price difference?
When combined with a decent human engineer, the OSS LLMs don’t need to be frontier, they just need to be good enough for coding use-cases which they already are.
Token Consumption Trend
The precise data is hard to find but the tokenmaxxing trend has only accelerated in recent months and years (https://blog.pragmaticengineer.com/the-pulse-tokenmaxxing-as-a-weird-new-trend/). Every good engineer I know agrees that it is stupid to goal on the tokenmaxxing but that’s a conversation for another essay. For better or worse, the token consumption has massively gone up (as is also evident by persistent shortage of GPUs).
So, we have a rising token consumption combined with rising per token pricing, as the US frontier labs push to capture more value.
(Human + an almost frontier LLM) vs Frontier LLM
We wrote a very long essay comparing human engineers vs AI agents on 12 different axis (https://www.signalbloom.ai/posts/why-task-proficiency-doesnt-equal-ai-autonomy/). The conclusion was that AI agents already overtook humans in coding and soon will overtake in scoped debugging but the for the other important skills required for good engineering (or being a good independent agent on anything), AI is still quite behind and the current statistical architecture will need to be augmented or replaced with some other breakthrough to solve problems. Some examples: long-term memory, Meta memory (being able to tell with certainty what you know and what you don’t), Evidential Sufficiency Assessment (whether there is enough evidence to act) and so on.
The present generation of frontier LLMs are exceptionally good at task handling, but task efficiency does not mean AI autonomy.
Possible future directions
Getting to the main point of this essay, below is a chart projecting at what point does an engineer in a cheaper country + a capable enough model become a better value for money than the top frontier model.
Frontier inference vs. cheap engineer + DeepSeek
Monthly cost over time, as token consumption, salaries, and model prices shift
Frontier model (inference only) Engineer + DeepSeek
Opinion
There are obvious simplistic assumptions in this chart such as the future price of the inference, the token consumption trends and more. There is also reflexivity - the actors in any market change their own behavior based on what they observe in the market. All of those are hard to factor in.
We have also ignored the fact, which would have made the comparison even more appealing to local models, that local models are getting better at a dizzying pace and more and more inference hardware is coming online in the coming months/years.
However, the deeper point we are trying to make is, the AI’s rising costs can only go so far before they become a concerning cashburn for enterprises and become a significant portion of the overall spend. This keeps a ceiling on how much or how fast the frontier labs can raise prices.