James Routley

Natural language is a wonderful interface, but just because we suddenly can doesn't mean we always should. LLM inference is slow and expensive, often taking tens of seconds to complete. Natural language interfaces have orders of magnitude more latency than normal graphic user interfaces. This doesn't mean we shouldn't use LLMs, it just means we need to be smart about how we build interfaces around them.

The Latency Problem

There's a classic CS diagram visualizing latency numbers for various compute operations: nanoseconds to lock a mutex, microseconds to reference memory, milliseconds to read 1 MB from disk. LLM inference usually takes 10s of seconds to complete. Streaming responses help compensate, but it's slow.

Compare interacting with an LLM over multiple turns to filling in a checklist, selecting items from a pulldown menu, setting a value on a slider bar, stepping through a series of such interactions as you fill out a multi-field dialogue. Graphic user interfaces are fast, with responses taking milliseconds, not seconds. But. But: they're not smart, they're not responsive, they don't shape themselves to the conversation with the full benefits of semantic understanding.

A popup dialogue with multiple conditional checkboxes, sliders, and text areas

This is a post about how to provide the best of both worlds: the clean affordances of structured user interfaces with the flexibility of natural language. Every part of the above interface was generated on the fly by an LLM.

Popup-MCP

This is a post about a tool I made called popup-mcp (MCP is a standardized tool-use interface for LLMs). I built it about 6 months ago and have been experimenting with it as a core part of my LLM interaction modality ever since. It's a big part of what has made me so fond of them, from such an early stage. Popup provides a single tool that when invoked spawns a popup with an arbitrary collection of GUI elements.

You can find popup here, along with instructions on how to use it. It's a local MCP tool that uses stdio, which means the process needs to run on the same computer as your LLM client. Popup supports structured GUIs made up of elements including multiple choice checkboxes, drop downs, sliders, and text boxes. These let LLMs render popups like the following:

The popup tool supports conditional visibility to allow for context-specific followup questions. Some elements start hidden, only becoming visible when conditions like 'checkbox clicked', 'slider value > 7', or 'checkbox A clicked && slider B < 7 && slider C > 8' become true. This lets LLMs construct complex and nuanced structures capturing not just their next stage of the conversation but where they think the conversation might go from there. Think of these as being a bit like conditional dialogue trees in CRPGs like Baldur's Gate or interview trees as used in consulting. The previous dialog, for example, expands as follows:

Because constructing this tree requires registering nested hypotheticals about how a conversation might progress, it provides a useful window into an LLM's internal cognitive state. You don't just see the question it wants to ask you, you see the followup questions it would ask based on various answer combinations. This is incredibly useful and often shows where the LLM is making incorrect assumptions. More importantly, this is fast. You can quickly explore counterfactuals without having to waste minutes on back-and-forth conversational turns and restarting conversations from checkpoints.

Speaking of incorrect LLM assumptions: every multiselect or dropdown automatically includes an 'Other' option, which - when selected - renders a textbox for the user to elaborate on what the LLM missed. This escape hatch started as an emergent pattern, but I recently modified the tool to _always_ auto-include an escape hatch option on all multiselects and dropdown menus.

This means that you can always intervene to steer the LLM when it has the wrong idea about where a conversation should go.

Why This Matters

Remember how I started by talking about latency, about how long a single LLM response takes? This combination of nested dialogue trees and escape hatches cuts that by ~25-75%, depending on how well the LLM anticipates where the conversation is going. It's surprising how often a series dropdown with its top 3-5 predictions will contain your next answer, especially when defining technical specs, and when it doesn't there's always the natural-language escape hatch offered by 'Other'.

Imagine generating a new RPG setting. Your LLM spawns a popup with options for the 5 most common patterns, with focused followup questions for each.

This isn't a generic GUI; it's fully specialized using everything the LLM knows about you, your project, and the interaction style you prefer. This captures 90% of what you're trying to do, so you select the relevant options and use 'Other' escape hatches to clarify as necessary.

These interactions have latency measured in milliseconds: when you check the 'Other' checkbox, a text box instantly appears, without even a network round-trip's worth of latency. When you're done, your answers are returned to the LLM as a JSON tool response.

You should think of this pattern as providing a reduction in amortized interaction latency: it'll still take 10s of seconds to produce a followup response when you submit a popup dialog, but if your average popup replaces > 1 rounds of chat you're still taking less time per unit of information exchanged. That's what I mean by amortized latency: that single expensive LLM invocation is amortized over multiple cheap interactions with deterministically rendered GUI run on your local machine.

Claude Code Planning Mode

I started hacking on this a few months before Claude Code released their AskUser tool (as used in planning mode). The AskUser tool provides a limited selection of TUI (terminal user interface) elements: multiple-choice and single-choice (with an always-included ‘Other’ option) and single-choice drop-downs. I originally chose not to publicize my library because of this, but I believe the addition of conditional elements is worth talking about.

Further, I have some feature requests for Claude Code. If anyone at Anthropic happens to be reading this these would all be pretty easily to implement:

Make the TUI interface used by the AskUserQuestion tool open and scriptable, such that plugins and user code can directly modify LLM-generated TUI interfaces, or directly generate their own without requiring a round-trip through the LLM to invoke the tool.
Provide pre and post-AskUser tool hooks so users can directly invoke code using TUI responses (eg filling templated prompts using TUI interface responses in certain contexts).
Extend the AskUser tool to support conditionally-rendered elements.

Conclusion

If you have an LLM chat app you should add inline structured GUI elements with conditionally visible followup questions to reduce amortized interaction latency. If you'd like to build on my library or tool definition, or just to talk shop, please reach out. I'd be happy to help. This technique is equally applicable to OS-native popups, terminal user interfaces, and web UIs.

I'll be writing more here. Publishing what I build is one of my core resolutions for 2026, and I have one hell of a backlog. Watch this space.

Stop using natural language interfaces

The Latency Problem

Popup-MCP

Why This Matters

Claude Code Planning Mode

Conclusion