forum, an accountable orchestrator for AI agents

It routes the work, runs it, and records every move.

forum is a daemon-hostable engine that coordinates a team of AI agents and records each step in a ledger you can verify. You give it a task. It picks the right specialist lane, turns the request into a validated plan, runs the work under witness, judges each result against its instruction, and combines everything into one answer. Every one of those steps is written to the ledger with a hash that chains to the one before it.

Around that core sit the parts that make a run trustworthy and lean: a bounded budget that stops a runaway loop, witnessed escalation up a ladder of stronger models, a check on whether the answer still covers the request, a delivery pass that tightens a verbose answer without dropping its terms, and clean seams for a peer brain to supply context and an outside verifier to check the work. Each is opt-in, each is witnessed, none is trusted on its word.

None of those roles communicate by convention or trust. Each receives a witnessed input, produces a witnessed output, and hands it forward. There is no back-channel.

The pains it answers.

These are real complaints from people running agent fleets. Each one drove a feature, and each maps to something forum does and witnesses.

a dollar becomes a hundred a RunBudget caps the run by model calls and wall-clock, stops it gracefully, and witnesses where it stopped

it drifts for hours after the answer is formed, forum checks how much of the request it still covers and escalates to a model judge when that floor flags, so drift lands on the record, not at hour three

which model even ran? every result records the model that produced it, and a failed task escalates up a ladder of stronger models on the auditable verdict

not another SaaS zero-dependency and self-hosted, model-agnostic across any command, any OpenAI-compatible server, or the Anthropic API; the daemon runs on your box

the loop has no contract a run carries a witnessed contract, organized context in and a bounded budget, and the ledger is the standing record of what was agreed and what happened

long loops blow the context window a run checkpoints at wave boundaries and resumes from the ledger, reusing what already succeeded, and injected context is bounded and pulled fresh per task

output is a wall of words a deterministic floor flags a verbose answer and an opt-in reviser tightens it, accepted only if the shorter version still covers the request

stale memory poisons the work forum plans and runs on organized context pulled from a brain through a clean seam; building and pruning that graph is a peer flagship's job, and forum consumes it, witnessed

The ledger is the differentiator.

A hash-chain ledger means each entry carries the hash of the entry before it. Alter any past record and every hash from that point forward no longer matches. You can replay the chain, recompute the hashes, and check the whole sequence yourself without asking forum whether it ran honestly. The bulky parts, prompts and outputs, are stored by the hash of their own bytes, so you can redact a sensitive body to its fingerprint and the chain still verifies; when the bodies are present, a deep check re-hashes each one and catches a swapped result.

A witnessed failure is more useful than an unwitnessed success.

When a task fails validation, the ledger records ok=false with the reason. Nothing is silently retried or promoted. The same record holds the budget stop, the escalation, the intent check, and the delivery revision, each chained to what it acted on, so a postmortem follows the causal links back to why a thing happened. A run that crashed picks up from that record, reusing what already succeeded.

Standard library only. Zero external dependencies.

The engine runs on Python’s asyncio using nothing outside the standard library. Four model-backed control roles sit above a layer of pluggable, model-agnostic executors; an executor can drive a model CLI, call an API, talk to a local OpenAI-compatible server, or run a stub for testing. The cheap, deterministic checks run first and the model is spent only when one of them earns it, the same discipline in routing, escalation, the intent check, and delivery.

Classifier reads the incoming task and picks the specialist lane that handles it

Coordinator turns the routed request into a validated plan before any work starts

Validator judges each output against the original instruction and records the verdict

Synthesizer combines all witnessed results into one answer for the caller

Current state. How to run it.

Version 1.12.0, on PyPI as forum-engine. The full engine is here: durable ledger, routing, the control loop, model-agnostic executors, an HTTP and MCP daemon, the forum CLI, witnessed budgets and escalation, the intent and delivery ladders, typed data-flow between tasks, resumable runs, and the context and verification seams. 249 tests plus two gated real-model tests, ruff and mypy clean, no third-party runtime dependency. The license is fair-source: source-available, rights reserved to fund the research; the smaller utility tools that came out of this work stay permissive.

$ pip install forum-engine
$ python examples/demo.py   # routes, plans, witnesses, then tamper-checks the ledger
$ python -m pytest
249 passed

Python, standard library only. Install it, run the demo, route a real task through it. Each example under examples/ is a short offline demonstration of one capability: the intent ladder, the delivery pass, a resumed run, an A/B of two runs. If something breaks or the ledger produces a result you cannot verify, that is exactly the kind of report I want.

GitHub → HarperZ9/forum · PyPI → forum-engine · ← all flagships · the index

Run a team of agents. Witness every step.

It routes the work, runs it, and records every move.

The pains it answers.

The ledger is the differentiator.

Standard library only. Zero external dependencies.

Current state. How to run it.