Project Telos field guide: where to test checkable AI work

Start with the public surfaces.

The whole body of work is public enough to test without a meeting. The useful first reader is not someone who applauds the thesis. It is someone who runs one piece on a workflow they understand and reports where the evidence chain breaks.

The ask is practical: verification, testing against real workflows, early traction from people willing to inspect receipts, and possibly modest grassroots research funding or pointers so the work can keep hardening before it looks like a normal company.

Main site GitHub profile gather index forum crucible the telos engine

Five places to test it.

Use de-identified or synthetic material first. Do not upload private records, patient data, credentials, or source material you cannot share. The point is to test the shape of the receipt, not to make the first run high-risk.

Doctor / clinical admin

Can a summary keep its uncertainty attached?

Demo: take a de-identified intake note, a clinic policy excerpt, and a reviewer note. Use gather to preserve source fragments and receipts, then use forum to route the review steps into a causal ledger.

Proof to inspect: every sentence in the summary should point back to an intake fragment, policy line, reviewer/tool handoff, or an explicit UNVERIFIABLE limit.

Break test: remove one source fragment and see whether the digest or ledger still catches the missing support.
Do not claim: clinical validation, diagnosis, or care advice.

Artist / studio

Can a creative branch stay inspectable?

Demo: start in the studio with a public or self-owned source image. Keep the source asset, prompt, seed, transform branch, critique note, and export gate together instead of treating the final render as a detached image.

Proof to inspect: a later viewer should be able to tell what was made, what was transformed, what was selected, and why the export was allowed.

Break test: change the source or seed and check whether the artifact still falsely reads as the same work.
Best fit: artists, creative technologists, provenance researchers, gallery/workflow people.

Media / newsroom

Can every public sentence find its source?

Demo: gather a small public-source packet for a story: original article, transcript, image source, correction, and conflict note. Then use a ledgered review pass to decide which public claims survive.

Proof to inspect: each published sentence should map to a witnessed source, a conflict note, an editorial decision, or UNVERIFIABLE. The model should not be the only witness to its own source check.

Break test: swap one quote or image source after the review and see whether the check catches drift.
Repos: gather, forum, and crucible.

Token economy / routing

Can the ledger show where model calls bought evidence?

Demo: route a multi-step agent task through forum. Keep the deterministic routing, budget stop, escalation, and delivery pass in the ledger. Use index to map the workspace before asking a stronger model to reason over it.

Proof to inspect: expensive model calls should have a reason: evidence coverage, route uncertainty, intent verification, or a specific unresolved question. Confident prose without a ledger reason is waste.

Break test: force a cheap route where escalation is needed, or escalate every step, then compare ledger quality and token spend.
Useful report: the first place where the ledger is too bulky, too thin, or too hard to replay.

Reasoning / public claims

Can the final authority live outside the model?

Demo: write a thesis with measurable claims, falsification conditions, and source substrate. Use crucible as the release-candidate/dev-parallel judgment organ: steelman the claim, measure what holds, report MATCH, DRIFT, or UNVERIFIABLE, then refine the weakest axis.

Proof to inspect: the answer should not be "the model thinks this is right." The answer should be a record a person can inspect: what claim was tested, what evidence supported it, what refuted it, what drifted, and what remained unverified.

Break test: make one claim too vague to measure and require the system to say UNVERIFIABLE rather than rescue it.
Public state: crucible is public, lowercase, release-candidate/dev-parallel, and not claimed stable or PyPI-published from this page.

What to send back.

A useful field report is small and concrete: the workflow you tried, which repo you used, what evidence the receipt preserved, what it failed to preserve, and whether the result was easier to inspect than a normal model answer. Screenshots are less useful than the exact receipt, ledger, command, or missing link.

field: newsroom claim review
repos: gather + forum
worked: source fragments and editorial handoff stayed visible
broke: conflict note was not tied tightly enough to the final sentence
ask: make the unresolved/conflict state easier to inspect before publish

Contact → zaindharper@gmail.com · GitHub → github.com/HarperZ9 · all flagships

Bring it real work. Then try to re-check it.