Governor — a control layer for agent swarms

The problem

A multi-agent task can burn a million tokens. Some of them on work that's already failing.

A single chat call is cheap. A swarm that chains calls, builds on each other's output, and runs to completion before anyone checks the result is where cost — and silent failure — compounds. One agent goes subtly wrong early; the rest confidently build on its garbage; you pay full price for all of it, plus the rework.

5–30×

— agentic systems consume 5–30× more tokens per task than a standard chat interaction; complex multi-agent tasks run from 200K to over 1M tokens each (industry token-usage analysis, 2026)

How it works

Watch every agent. Catch the cascade. Throttle the waste.

Three layers, one lightweight wrapper around your existing agent calls. No retraining, no new model.

01 · WATCH

A health signal per agent

Governor tracks a one-sided health score on every agent — it falls when an agent's output degrades, and isn't fooled by an agent doing better than usual or by a slow, sustained drift.

P(t) · degradation-from-peak

02 · DETECT

Self-calibrating divergence

It learns the normal "togetherness" of coupled agents from healthy traffic, then fires when that coupling breaks — no threshold to tune, and it catches slow drift a fixed alarm would miss.

divergence z-score · learns the baseline

03 · ACT

Throttle, don't kill

The moment a failure starts to spread, Governor throttles the agents building on the bad work — a continuous dial, not an on/off switch — and routes the freed budget where you point it.

continuous throttle · attributable

One switch

Spend less — or spend the same on a better answer.

Same mechanism, one toggle: what happens to the budget Governor frees up when it throttles a failing branch.

↓

Savings

Bank the freed tokens. Throttle the work that's going wrong, hold quality, and cut the bill.

−20–30%

typical spend reduction on tightly-coupled swarms*

↑

Quality

Hold the budget flat and reallocate the freed spend to the recovering work and the healthy agents — a better final answer for the same money.

same budget

measurably higher final quality*

* Figures from in-simulation control-logic testing across coupled-swarm scenarios. Live-system numbers are exactly what the design-partner program is built to establish — we'd rather show you a real number on your workload than quote you ours.

The difference

Observability tells you what it cost. Governor changes what it costs.

Every agent-monitoring tool on the market watches and reports — it hands you a dashboard and a bill after the run. Governor is a control layer: it acts on the run while it's happening, throttling the waste before the tokens are spent. You can't save a bill you can only see.

Observability toolsGovernor

Watches & logsActs mid-run

Reports cost afterCuts cost during

Static waste (prompts, caching)Dynamic waste (cascades)

You read the dashboardIt moves the spend

Where it comes from

Not a new idea — a proven one, productized.

Governor isn't a fresh bet. It's the productized form of Reasoning Chain Selection, a method from our working-paper series that ranks reasoning chains by a P(t) health signal at r=0.9994 against ground truth. Governing a swarm of coupled agents is the same signal generalized: from "which chain is healthiest" to "which agent is dragging the rest down." The portfolio is the proof; Governor is the first product built on it.

r=0.9994

chain-health correlation to ground truth (PRM800K, 30,500 chains) — the method Governor generalizes

100%

of achievable oracle on best-of-N selection

21+

working papers, all code & benchmarks public

patent

pending on the core P(t) framework

See the full methods portfolio →— every result Governor is built on, with public code

Integration

A wrapper, not a rewrite.

Governor sits between your agent framework and your model provider. Point it at your swarm, pick a mode, set a throttle floor. Your agents don't change.

# pip install governor (design-partner preview)
from governor import Governor

gov = Governor(
  mode="savings",  # or "quality" — hold budget, reallocate
  throttle_floor=0.3, # how far to cut a failing branch
  sensitivity=3.0,   # divergence σ to act on
)

# wrap your existing multi-agent run
with gov.watch(swarm):
  result = swarm.run(task)

# gov.report() → tokens saved / quality Δ / what it throttled and when

Try the live sandbox →the full interactive harness — every knob, every scenario

We're taking on a small number of design partners.

If you run a multi-agent system in production and you're feeling the token bill, we'll instrument it, show you a real number on your workload, and you keep whatever we save. No "buy now" — this is early, and we'd rather earn the number than quote one.

Become a design partner

Honest status: Governor's control logic is validated in simulation across coupled-swarm scenarios; the live middleware is what the design-partner program builds, with you, on your stack. You'll know exactly what's measured vs. modeled at every step.