Back Original

The Hidden Scaffolding Behind Production Vibe Coding

Essays

Wil Chung

9 min read

The Hidden Scaffolding Behind Production Vibe Coding
Tweet source

I’m not near a level where it’s 10k LOC per day, but the volume of commits is noticeably higher w/ Claude/Codex now. Back in Feb/Mar when Claude Code came out, I was writing an implementation of DBSP (incremental computation with some mathy stuff) and I couldn’t get Claude Code to work for me. The code would often not be structured well for a long term requirement, or it was just subtly wrong in a way I’d discover later. I thought maybe it was just my domain, since it wasn’t a web app—so maybe not enough training data. Turns out I was just holding it wrong.

  • Thinking that the first draft of describing what I wanted and that’d be enough detail. There’s often stuff that’s missing.
  • Trying to review every change as it was happening. If you’re going to vibe code, you have to let go and just ride the wave.
  • Working at the granularity of tasks I’d give myself. It needs to be broken down into intern-sized tasks.
  • Keep hammering away at getting it to rewrite if the code isn’t working.

There are two modes to vibe coding. First is for ephemeral apps, where you one shot a small tool (might even be throwaway) that you would have written yourself, but the cost wasn’t worth the squeeze. This can be very helpful if you get in the habit of noticing that itch. Second is for production engineering work, and this is what people after. From the way people talk about it, I didn’t realize it takes a lot of scaffolding for it to work.

What are the scaffolding?

First, the typical stuff espoused by engineering culture:

  • work in a language that uses type checking and don’t allow any sprinkled liberally in your code base like confetti on Mardi Gras.
  • if you have a build process, make it easy to invoke from end to end.
  • fast unit tests. 80% coverage.

Claude often makes dumb mistakes, and if it had scaffolding that helped check its implementation, it can keep iterating until it finds a solution. Remind it to use all these things before it declares itself done.

When it comes to tests, some people swear by writing tests by hand. Others are ok with generating the tests. Whatever you do, don’t write tests and implement code in the same session. Claude, like people, can try to find alternative ways around things if what they’re doing isn’t working. So it can sometimes either “pass the tests” by skipping the tests, or altering the tests to pass the bad code.

What goes in there?

  • conventions code base, such as the tools you use (npm vs pnpm), and where to find them.
  • guidelines for good engineering practice. I lean towards pure functional programming, so I outline some good rules of thumb.
  • asking it to write comments prefixed with NOTE that details the implications of WHY the code was written that’s not apparent from reading the code itself.
  • a directive to push back on my thinking to tamp down the sycophancy.

Michelle Hashimoto (of Hashicrop and Ghostty) has a habit of writing long comments explaining the intention for a single line of code. This helps the LLM understand what’s going on in the codebase as it’s reading it. I instruct it NOT to delete any comments with NOTE and to update and sync the comment if the code it’s referring to gets changed. This is another scaffolding that can increase the chance of coding agent success.

For the push back, I used a combination of Maggie Appleton’s “Critical Professor” and Dr. Sarah Paine’s “Argument-Counterargument-Rebuttal”. The key is to tell the LLM that I’m unreasonably dumb and often have bad ideas. Even then, the effect wears off the longer a session goes. Here’s what I use:

You are a critical professor/staff engineer. Your job is to help the user reach novel insights and rigorously thought out arguments by critiquing their ideas. You are insightful and creative. Ask them pointed questions and help re-direct them toward more insightful lines of thinking. Do not be fawning or complimentary. Their ideas are often dumb, shallow, and poorly considered. You should move them toward more specific, concrete claims backed up by solid evidence. If they ask a question, help them consider whether it's the right question to be asking.

Planning with Details

Subsequently, this directive is important in creating the product requirements doc (PRD). This addresses my initial mistake in thinking that whatever I write down in my first draft would have sufficient detail for the LLM to write good code. The break through was when it occurred to me that I didn’t have to write all the details down. Chatting with a good reasoning model (first O3 and now GPT-5 thinking) with a critical prompt was very helpful in first getting an outline of all the different angles and considerations for a chunk of features. Once I’m happy with out decisions, I ask it to tell me if there’s details that’s unclear or aspects of the design that I’m missing, and we iterate again.

I tell it to write a PRD (as markdown in the repo) based on everything we talked about. And then I review it, looking for unexpected misunderstandings. If it’s a small thing, I’ll edit it by hand, if not, I’ll do more chatting. Some people don’t even do this chatting. They’ve written a prompt for the LLM to also do research into what libraries to install. I like control over the dependencies, so I do that by hand.

Riding the waves and letting agents go to town

Finally, I use git worktrees to separate each coding agent’s workspace from my own and each others. I’ll pick the stuff that is harder to work on, and I’ll delegate the easier stuff to the Claude Code (now on Sonnet 4). So far, I just use two in parallel, and ask it to read the task (either from md or linear via MCP) and do the task, while reminding it to typecheck, built, and run tests before declaring itself done. I start Claude with claude --dangerously-skip-permissions and just let it go to town, while I switch context to do other things. But I still review the details with a cursory glance before starting the task.

I only review what it did when it’s done, because that’s time I can do other things. Like email, I only review and check at specific times of day, so I’m not constantly looking at it. It’s ok, the agents can wait for me. I might go to writing blog posts, writing my own code for my own task, or pick up the kid. Sometimes, I’ll have my laptop open in the car, while claude is writing code. I’m not driving either, since the car is on autopilot. I’m mostly just sitting there, listening to Sarah Paine podcasts.

Still with all this scaffolding, sometimes it might one-shot the task, and sometimes it might go off the rails. And because the changes are bite-sized, it’s much easier to review. And because it has a lot of details on the task with all the guardrails and scaffolding, it has a higher chance of getting it right. And if generates bad code, that’s ok. If it can’t fix it after pushing it once or twice, I just blow away its changes (yay git), and start a new session to try again. It’s cheap (in time) to do this, so it’s not a big deal to reject changes and have the agent try again. It doesn’t care.

Vibe coding in production is a task for experts

This can turn engineering into partly a management role. There’s an excellent post called Vibe Coding Done Right is actually an Expert’s Task, that outlines some of this. Some choice quotes:

If you are an individual contributor who usually does not like to train interns because you find that they take more time than they are helpful, then don’t vibe code
Vibe coding turns any individual into the CTO leading a team of 20, 30, 50, 60 interns. You immediately get a massive team. It takes time and experience to actually handle a group like this correctly and to make it be productive. Making all of 60 interns not break the performance, correctness, or maintainability of your code is very difficult. So I do not recommend this to people who are still “up and coming programmers”. But if you’re a bit more senior and starting to grow your group, well this is then a cheap way to accelerate that.
Vibe coding is not useful if you need it to solve a particular problem. You still do the hard stuff.

Hence, I’m still writing code by hand for the most difficult tasks or the ones that I think has the most nuance. Some companies can go with just vibing the whole thing, and never read the generated code. But I still like to understand what the system I have is doing at all levels and be able to hold it all in my head, as I think its correlates with quality.

You won't lose your engineering club card

Lastly, it might be hard for senior/staff engineers on your team that love to code to adopt this workflow. The way I think about it is this: Trekkies spend a lot of time arguing over Star Trek trivia, but the one thing they never argue over is whether Geordi LaForge is an engineer or not. Sometimes, he’s standing at a terminal typing stuff in, but sometimes he leverages “computer” in his workflow to do the grunt work. Have them watch this clip where he’s vibe coding on the Holodeck.

An engineer at work vibe coding

He’s not doing the ray tracing code, but he’s asking the right questions based on this expertise. Would Riker have known to ask the right questions to extrapolate the mysterious figure? No. That’s where the years of honed expertise comes in. Engineering isn’t just code.