Back Original

Show HN: Statewright – Visual state machines that make AI agents reliable

Agents are suggestions, states are laws.

State machine guardrails that control which tools your AI agent can use in each phase. Define a workflow once, enforce it across Claude Code, Codex, Cursor, opencode, and Pi. Full docs →

Statewright workflow editor

Try it out in Claude Code on the free tier by running the following:

/plugin marketplace add statewright/statewright

/plugin install statewright

/reload-plugins

Then start the bugfix workflow or /statewright start bugfix. You'll need to paste your API key when prompted. The latest versions of Claude may complain -- paste the API key again and say you really mean it, Claude is just being cautious here.

AI agents are powerful but brittle. Give a model 40+ tools and an open-ended problem and it barely gets out of the gate. The common fix is bigger models and longer prompts... it helps sometimes. Observability tells you what went wrong after the fact; it doesn't prevent it.

Instead of making the model bigger, make the problem smaller.

State machines constrain the tool and solution spaces so the model reasons in a focused context at each step. A planning state gets read-only tools. When the agent transitions to implementation, edit tools unlock with limited shell access (write-via-redirect and destructive ops are blocked even when Bash is allowed). Testing only permits designated test commands. If you call a tool that's not in the current phase, you get rejected with a message telling you what IS available and how to transition.

Works the same way on frontier models (fewer tokens to completion) and local models where 13B+ models start solving tasks they'd otherwise fail.

Model Size Bug Fix (26 lines) SWE-bench (5 tasks)
gemma3 3.3GB FAIL FAIL
gemma4:e2b 7.2GB PASS* FAIL
gpt-oss:20b 13.8GB PASS PASS (5/5)
gemma4:31b 19.9GB PASS PASS (5/5)
llama3.3 42.5GB PASS PASS (2/2)†

*with specialized edit_line tool adaptation †tested on 2 of the 5 tasks (added after initial experiment run)

We validated on local models where the effect is most measurable. In our 5-task SWE-bench subset, two models (13.8GB and 19.9GB) went from 2/10 to 10/10 with statewright constraints. Same tasks, same hardware. Below 13GB, models can produce tool calls but can't retain enough file content to produce accurate edits — that's the floor, not a statewright limitation.

Frontier models with default system prompts handle the obvious catastrophic actions (database deletion, credential leaks)... most of the time. The structural win is bigger: breaking read-loop death spirals where models re-read the same file 5+ times without ever editing, and keeping the tool space small enough that the model actually reasons instead of flailing. Research brief →

Install into Claude Code:

/plugin marketplace add statewright/statewright
/plugin install statewright

Your browser opens → sign up at statewright.ai → generate a key → paste it → done.

Then start a workflow:

❯ start the bugfix workflow — fix the failing tests in calc.py

◆ statewright — statewright_start (workflow: bugfix)
◆ [statewright] Workflow activated: bugfix

◆ statewright — statewright_get_state (MCP)

◆ Current phase: planning. Let me read the code first.

  Read 2 files

  [statewright] planning => implementing

◆ statewright — statewright_transition (READY)

  Edit calc.py: 1 line changed

  [statewright] implementing => testing

◆ statewright — statewright_transition (DONE)

  Bash: pytest -x — 7 passed

  [statewright] testing => completed
◆ [statewright] Workflow complete. 46 seconds.

You can also use the slash command directly: /statewright start bugfix.

The core is a Rust engine that evaluates state machine definitions: states, transitions, guards, tool restrictions. It's deterministic. No LLM in the loop.

On top of that sits a plugin layer that integrates with your coding agent via MCP. When you activate a workflow, hooks enforce tool restrictions per state automatically. The model sees 5 tools instead of 30, gets clear instructions for the current phase, and transitions when conditions are met.

Guardrail What it does
Per-state tool enforcement Tools invisible to agent when not in allowed_tools
Bash discernment Redirects (>>), destructive ops (rm, shred), and scripting interpreters blocked in non-write states
Edit guards Rejects diffs exceeding max_edit_lines, caps files edited per state
Command allow-lists Prefix-matched allowed_commands per state
Conditional transitions Guards with programmatic predicates (eq, gt, exists, etc.) on context data
Approval gates requires_approval pauses for human review before high-risk transitions
Environment scoping blocked_env + env_overrides per state
Session isolation Per-session state via CLAUDE_SESSION_ID

Full guardrail reference in the docs.

Define your own workflows

{
  "id": "bugfix",
  "initial": "planning",
  "states": {
    "planning": {
      "allowed_tools": ["Read", "Grep", "Glob"],
      "max_iterations": 8,
      "on": { "READY": "implementing" }
    },
    "implementing": {
      "allowed_tools": ["Read", "Edit", "Write"],
      "max_edit_lines": 20,
      "max_files_per_state": 3,
      "on": { "DONE": "testing" }
    },
    "testing": {
      "allowed_tools": ["Read", "Bash"],
      "allowed_commands": ["pytest", "cargo test", "npm test"],
      "on": {
        "PASS": { "target": "completed", "guard": "tests_passed" },
        "FAIL_TEST": "implementing"
      }
    },
    "completed": { "type": "final" }
  },
  "guards": {
    "tests_passed": { "field": "test_result", "op": "eq", "value": "pass" }
  }
}

State machines aren't DAGs — they loop and retry, which is what agentic work actually needs. Point your agent at the JSON schema and it generates a workflow via statewright_create_workflow. Tweak tools, commands, and environment blocks in the visual editor.

Agent Integration Enforcement
Claude Code Hooks + MCP Hard (protocol layer)
Codex Hooks Hard (alpha)
opencode TypeScript plugin Hard (alpha)
Pi Skills extension Hard (alpha)
Cursor MCP + rules Advisory (alpha)

Hard = tool calls blocked at the protocol layer before the model sees them. Advisory = rules injected into context but not enforced.

Free for individual developers. The managed cloud at statewright.ai handles workflow storage, run history, and the MCP gateway.

(these tiers are likely to be in flux: prices will not increase, tier grants can only increase)

Plan Workflows Transitions/mo Run History Price
Free 3 200 72 hours $0
Pro 10 2500 7 days $29/mo
Team 30 10000 90 days $99/mo
Enterprise Unlimited Unlimited to Specification Contact us

The engine (crates/engine) is Apache 2.0 and embeddable with no runtime dependencies. Single-developer and single-team self-hosting of the full stack is permitted under the FSL license.

use statewright_engine::{MachineDefinition, resolve_transition, validate_definition};
  • Requires MCP support in the agent (or hooks for non-MCP agents like Codex)
  • Workflow definitions are authored by hand, though agents can generate them via statewright_create_workflow
  • Cursor enforcement is advisory, not hard. MCP alone can't gate tool calls in Cursor's architecture
  • Research results are from a 5-task SWE-bench subset, not the full 2294-instance benchmark
  • If a workflow is too restrictive, the agent gets stuck. statewright_deactivate is the escape hatch

statewright.ai/docs — install guide, workflow authoring, schema reference, MCP tool reference, and agent-generated workflows.

Workflow definitions, templates, and bug reports welcome. See Create Your Own for how to write workflows.

Apache 2.0 — portions FSL-1.1-ALv2 (converts to Apache 2.0 on May 3, 2029). Managed cloud at statewright.ai.

One hook to rule them all.