James Routley

Early on it was all about prompting. GitHub Copilot had been around for a bit, but using language models meant using web chat, and I don’t think Cursor had arrived yet. The idea was that we had to choose our words wisely. Then it was all about context, at least in my experience, and looking back this must have been at about the time that RAG was being talked about (and curiously Cursor). The idea was, in addition to our choice of words, that we had to pick content wisely too. Point a model in the right direction and to the input and output artifacts that matter. Then came practices like using different models to their strengths (and several subscriptions), building intermediate artifacts (like a plan, a log, or doing compaction), MCP, tool-use, etc.

Recently, in a professional context, my work is centered around engineering the confidence to make changes in an existing codebase. How can I get a language model not just to do a task, not just to do it well, but doing it in a way where I have real confidence without it costing me lot of time, money, and angst? If that means being in the loop a lot of the time, so be it, I’m obviously looking at the overall overarching trade-off over time. I seek to understand rigorously and robustly and reproducibly what’s already there (e.g. links to actual code fragments, or results from running shell snippets, both possibly obtained using a language model), understand what’s desired (e.g. by supplementing my domain knowledge using a language model and other sources), and then work on a test suite (or use other tooling such as static analysis via command-line) to build a safety-net I can count on before requesting and refining an implementation. The key is in working hard to do this up front, before an implementation, not during or in-tandem. A “plan”, but one that uses executable tooling, not text, and one where I have intermediate factual results I can count on and point to (an audit trail of facts that conclusions were drawn from, though, this says nothing about the scope and validity of the questions asked in the first place).

On the other hand, for work where I did not have the time and the risk of mistake was nonexistent, an experiment or a proof-of-concept, a test-bed with data (a DB, a schema, and a decent data-set to work with) and interface-centric iteration and prompting has worked really well to get a result, but only the end result or artifact, not the code. The code is awful with a hands-off approach, and Claude Code & Cursor hit a wall, or I hit one using them. It’s not as simple as it being and issue of language models building a god-object, even when there are distinct entities, they’re intertwined, plus lots of code that simply need not have been there. Chiseling away at working but bad code is not something I enjoy, with or without a language model, but over three attempts and a lot of determination not just to get the code I wanted but to see if it were possible to obtain it through refactor and refinement, one of my attempts succeeded (I got Cursor & Claude to delete two big needless chunks of code, and refine block-by-block about as much, making it a 4:1 ratio of cruft to decent and a 3:1 ratio of success to failure).

As of early 2026, the best way for me to use language models is variable, it depends on the situation:

In a professional context in an existing code-base where the risk of a mistake is high, but the work is not particularly novel, and the code-base is set-up with all those .md files documenting good practice, then TDD/Test-First has worked well as of three months ago.
In an experimental context a test-bed and result-driven iteration has worked really well, it’s gotten me a black-box or something I can validate outside-in, along with a workable insight in code, but I have to make sure to move to hand-written or hands-on code from scratch quickly and decisively.
Using the web chat is a good place to start exploratory work. My guess is that an existing code-base includes all sorts of assumptions that steer a model away from what you really want for exploratory work. “Baggage”. A clean slate, just a language model, can be the best thing for the more novel, ambitious, and compartmentalisable work. On reading the above, I seem to be stating the obvious, but what I really mean is that the most subtle things might have an effect. Even the mere fact that you’re using a certain language might impose a kind of criteria you don’t want at first.
Moving forward, if the web chat fails me, I will be subverting my IDE use: chart a course with a language model, a language model in the passenger seat, and I’ll get in the driver seat to write code. This might seem counter intuitive or pointless even, but this could work well because reading code is much harder and much more taxing than writing it, so better drive than review and revise endlessly (and don’t tell me quality is down to prompting up-front, getting the right thing in an MD file, or just setting up linters and formatters and similar, using or getting a frame-work in place, or pitching two bots against each other or something like that - those might help, but there is more to quality than I can accomplished with those, and could end up spending so much time and money that it doesn’t make sense). I hope this will give me a chance to avoid the verbosity before it is produced and before it intertwines with itself and existing code. I want to stay receptive to the code-base as well, plus, I think this is a good way to make sure these products don’t have a hold on us and our code in a way they can keep us captive. Bad code needs a tool to deal with it, maybe the one that produced it (though I’d hope not), and because no-one wants to fix it by hand because (1) that entails all the work to understand it in the first place before even starting to untangle a mess so difficult it makes a slinky look easy, there’s also (2) that more and more of us do not code “by hand”.

These are personal techniques and personal preferences (I would not say they are choices). As language models and related products embed themselves into my day-to-day work and life, when I step back to think through what really works for me and my work, it has got to be that they’re a good retrieval system to deal with all the challenges of software development and documentation and content or something lacking therein. I can only say that language models, Chat, IDE or CLI products have off-loaded the mundane part of my work (and I wish that were automated away more deterministically). I started using these because I didn’t want to become complacent. I worry that (heavy) use will lead to just this in an ironic twist. I am still looking forward to mastering my day to day tooling, fixing problems, not circumventing them. The question is, how to balance results in the short-term, with study in the medium term.

My Use of Language Models Past, Present, and Future