Proof Surface — the accountability contract

Accountability built in, not just promised.

Here is the heart of it. A tool that claims to keep things accountable, but quietly hands itself power, has already broken its own promise — so I built this one to do the opposite. It will not accept anything shaped like a grab for power. Every answer it gives comes from a short, fixed list it cannot wander outside of — allow, deny, needs-human (meaning: stop and ask a person), deploy, block — and you can always check which one you got. It writes down what happened; it leaves the actual stopping and starting to the program around it, or to you. It is the record, not the guard. I did not want it to be the guard.

There are two kinds of receipt, kept deliberately apart. The work-record receipt faces outward — a checkable note of what the agent did, so it can be reviewed. The authorization receipt faces inward — proof that a real person really granted permission, and it must say when that permission runs out and spell out exactly which actions are allowed. An empty list allows nothing at all. Permission has to be said out loud, kept narrow, and given an end. And the same simple rule — reject anything shaped like a grab for power — is checked at every level, even deep inside nested parts, so nothing slips through unnoticed.

The gate — the doorway every action passes through — holds back its yes until permission, budget, and the actual observed situation each clearly check out. It says no by default; if anything is unclear it still says no; and even its yes is only ever a recommendation.

The contract family.

Proof-surface packet — think of it as a plain envelope of evidence. One part fills it; another part reads it. It is the shared backbone the rest of the pieces clip onto.

Work-record receipt — a checkable note of what an agent did, handed outward so someone can look it over. It only goes one way: the tool writes it and sends it out. It is never read back in and trusted as if it were the agent’s own memory.

Authorization receipt — proof that a real person said yes: a permission that is small on purpose, that runs out, and that can be taken back. It is something to check against, never something fed back into the agent as if it were settled fact. The check_action step asks one plain question — was this exact action allowed? — and answers it; it never quietly writes a “you are trusted” note into the agent’s own memory.

Pre-execution gate — the doorway every action passes through before it happens. It says no by default; if anything is unclear it still says no; and its answer is a recommendation, not a command. You hand it a planned action, the permission slip behind it, a budget, and (if you have it) a look at the real situation, and it gives back a GateDecision — allow, deny, or needs-human (stop and ask a person) — and shows its reasoning for each one. Allow is the rarest answer. Anything it cannot clearly confirm becomes needs-human instead of quietly slipping through.

Evaluation contract — a final check before something ships, not a feel-good grade. Each thing it measures has a bar to clear and a note for whether it’s required. The evaluate step answers deploy, block, or needs-human, and it takes uncertainty seriously. If a measurement lands too close to call — partly above the bar, partly below — it stops and asks a person. If something required was never measured at all, that too is needs-human. It never ships on a maybe.

Claim ledger — a shared, traceable memory for when several agents are working together. Every claim carries how sure its source was, plus honest links: what it leans on, and what it clashes with — and those links have to point at real things. The ledger quietly raises its hand at shaky claims, at clashes someone flagged, and at everything downstream that a doubtful claim might have tainted. It tells you where things came from and how sure they are; it does not pretend to be the judge of what’s true.

Delegation chain — the trail when permission is handed down, person to helper to helper. The very first handoff has to come from a real human — an agent can never be the original source of authority. Each handoff can only narrow what it received, never widen it; a helper quietly granting itself more actions or more targets is exactly the kind of grab this catches, and it is DENIED. Each step is sealed to the one before it with a fingerprint (SHA-256), and the whole trail is locked into a single chain_binding, so quietly snipping a step off or tacking one on gets caught. The verify_delegation step gives one of three plain answers — VALID, DENIED, or UNVERIFIABLE — and it only fills in effective_scope (what the permission actually amounts to) when the answer is VALID.

Don’t take the contract. Watch it refuse.

The gate and the delegation chain are the two places permission actually gets checked, so let’s watch them work. Each example below is real — something that ought to be refused, and the plain answer it gets back.

Exhibit I Proof Surface · the gate is default-deny and fail-closed

# a gate request (dict) whose authorization receipt is absent
>>> evaluate_gate(request)
GateDecision(decision="deny")
  authorization: no receipt — deny
  budget:        unknown — needs-human
  state:         no observation — needs-human

# an authorization receipt whose allowed_actions list is empty
>>> check_action(receipt, "write_file")
False   # empty allowlist authorizes nothing

# a receipt dict missing expires_at — validation rejects it
>>> validate_authorization_receipt({"allowed_actions": ["write_file"]})
invalid: "expires_at" required — authority must expire

$ pytest
258 passed in 0.26s

No permission slip, the answer is no. An empty list of allowed actions, the answer is no. No end date, and the slip is thrown out before the gate even looks at it. A yes has to be earned by passing every check — it is never just what’s left when nothing says no.

The gate lives in proof_surface/pre_execution_gate.py, the permission slip in proof_surface/authorization_receipt.py. Captured live from pytest — 258 passed in 0.26s — stdlib-only, zero dependencies, github.com/HarperZ9/proof-surface. Honest limit: the gate only gives a recommendation; the program around it, or you, does the actual enforcing. And the fingerprint-chain that seals the handoffs proves each step connects to the one before it — it is not proof against a determined faker who rewrites a step and re-seals the whole thing to match. Real protection from forgery needs an outside anchor (keep a copy of chain_binding somewhere separate, or check a proper signature). Ask it for a signature it has no way to check, and it answers UNVERIFIABLE rather than fake a pass.

Exhibit II Proof Surface · delegation chain refuses privilege escalation

# root hop: human grants agent-A a scope of ["read_log"]
# second hop: agent-A tries to delegate ["read_log", "write_config"] — wider
>>> verify_delegation(chain)
DENIED   # hop 1 scope is not a subset of hop 0 — privilege escalation

# a chain whose root from-field is an agent, not a human
>>> verify_delegation(chain_rooted_in_agent)
DENIED   # root hop must be a human — authority cannot originate with an agent

# a valid chain: human → agent-A ["read_log"] → agent-B ["read_log"]
>>> verify_delegation(valid_chain)
VALID    effective_scope = ["read_log"]

A helper can never widen what it was handed. The very first link has to be a person. And effective_scope — what the permission actually adds up to — is filled in only when the answer is VALID; a DENIED or UNVERIFIABLE answer carries none.

The handoff logic lives in proof_surface/delegation_chain.py, and any unexpected extra data is refused at every level. Action and target names have to match exactly, capital letters and all, because the real things they point to do too. Honest limit: same as before — the seal proves each step connects to the one before it, not that any particular person truly signed it. 258 passed in 0.26s — the full proof-surface test set.

What it is, and what it is not.

Proof Surface is a small set of checkers and decision-helpers — plain Python, nothing extra to install, 258 passing tests. Every answer comes from a short, fixed list; none is invented; and not one of these pieces ever hands itself power. The rule against power-grabs is checked all the way down, even deep inside nested parts, so nothing sneaks past a check that only looked at the surface.

Plainly: the gate only recommends. This tool advises; the program around it does the enforcing. A gate that called itself the guard and ran right inside the very thing it was supposed to be guarding wouldn’t really be a gate at all. So I kept this to the promise and the record, and left the actual enforcing in your hands — where you can look at it, and swap it out, including for your own version if you don’t trust mine.

This is one person’s first working version, built from plain Python — the parts I can honestly stand behind, not a finished, hardened guard for real-world use. I would rather tell you where that line is than paint over it.

The claim ledger keeps things honest about doubt: a claim nobody believes at all (confidence 0.0) is still written down, not quietly dropped or nudged upward. The evaluation contract won’t ship on a maybe: a measurement that lands too close to call stops and asks, it never just passes. And the delegation chain’s seal will catch a step that’s been damaged, or quietly snipped off, or tacked on — but it can’t stop a determined faker who rewrites a step and re-seals the whole trail to match. That edge is named out loud, not glossed over.

Sample reports. These pieces produce real review packets you can hold and hand off — here are a few, from an early run in an earlier format: a release-readiness report, an EMET witness sample, a proof-index, and a public-surface-sweeper review.

GitHub — HarperZ9/proof-surface · Email — zaindharper@gmail.com · ← the map · the index

A grant of authority should expire, be traced, and say no.

Accountability built in, not just promised.

The contract family.

Don’t take the contract. Watch it refuse.

What it is, and what it is not.