Zain Dana HarperAleph · adversarial evaluation

ADVERSARIAL EVALUATION · ALEPH

Break your model on purpose — and keep the evidence.

Before a model ships, someone has to find what it does wrong: the prompt that slips past the rules, the case it fails quietly, the behaviour no one thought to test. Most teams do that by hand, once, and can’t run it again. Aleph is the other way — a platform that keeps a model under steady, bounded pressure, a self-red-team that attacks the model on purpose, and writes down exactly what broke and how. What you get back isn’t a screenshot and a hunch. It’s evidence you can re-run yourself and get the same result, and the training signal to close the gap. It is the same backbone the rest of this site is built on, turned into a tool for hardening models before the world gets to them.

The red-team is held to the same standard as the model under it: bounded, witnessed, and re-runnable for the same result. A test you can’t run again isn’t a test.

a private platform · offered as a capability, by inquiry · the one page here I can’t hand you the receipts for

The honest part, before anything else.

Everything else on this site, you can run yourself — that is the entire point of it. This page is the exception, and I would rather say so plainly than dress it up. Aleph is private. I can’t hand you the code or a test log here, because what it does and how it does it don’t belong in public. So read this as a capability and a claim, not a proof. On every other page I ask you to check me instead of trust me. On this one I can’t, yet — so I’m asking for a conversation instead, where I can actually show you, under terms that fit what it is.

If “proof before trust” is the rule, then the honest thing to do with the one tool I can’t prove in public is to tell you that, not hide it.

A safety result you can’t reproduce isn’t a result.

Every capable model ships carrying ways it can fail that no one has mapped — not because teams are careless, but because the number of ways a model can go wrong is enormous, and one person can only test so much of it by hand. So this kind of testing happens in bursts: clever, valuable, and almost impossible to run the same way twice. A finding lives in a chat log. A fix is checked against a gut feeling. The next model breaks again on the thing the last one fixed, and no one notices until a user does.

That is the same blind spot this whole project exists to answer, in a place where more is at stake. You should not have to trust that a model was tested — you should be able to run the test again yourself and watch it land in the same place. Aleph is built so that the pressure is steady instead of occasional, wide instead of hand-sized, and — the part that matters — written down instead of remembered.

The point isn’t only to find the failure. It’s to find it again tomorrow, on purpose, and prove you closed it.

An adversary that is itself accountable.

Aleph runs on its own against a model you own: it pushes on the model from many angles, watches what the model actually does, and keeps the cases where it broke — turned into evidence you can re-run yourself, and, where you want it, into training material so the next version is strong where this one was weak. That much, a number of tools attempt. What makes this one mine is the part underneath: the red-team is bounded and witnessed like everything else on this site.

Bounded — it works inside a permission it cannot widen; the same gate that says no until it’s sure, used everywhere else here, decides what it may and may not touch, and an empty permission does nothing. Witnessed — every finding carries a record of what was done and what happened that you can re-run for the same result, so a safety team checks evidence, not stories, and can hand that evidence to someone else who can check it from scratch. A test tool that is itself a sealed box just moves the trust problem one step back. This one doesn’t move it; it closes it.

Pressure that is steady, wide, and on the record — and that can’t do anything it wasn’t given permission to do.

What makes it different, running live.

The findings streaming below are synthetic — a stand-in for a real run, so not one real technique ever touches this page. But the part that makes Aleph different isn’t fake, and it’s running right now, in your browser: every case is kept inside an allowed boundary, and every one carries a real SHA-256 fingerprint you could re-derive yourself — the same kind of fingerprint the rest of this site is built on. That’s the part a safety team most needs: not just what broke, but a record of every probe that still holds up when someone else checks it from scratch.

Synthetic findings, real accountability. The fingerprints are genuine — computed in your browser, not pasted in — and the categories are the kinds of things a safety team actually checks; the techniques, the targets, and the platform itself stay private. What you’re watching is the tool holding itself to account. The capability that produces the findings lives in a conversation, under terms — not on a page.

So — what’s all of it actually for?

If someone asked me straight — what could all of these tools be used for? — here is the honest answer, and it isn’t a list of attacks. Every tool here does one quiet move, in a different place: perceive what’s really there and write it down, gate what is allowed to happen by saying no until it’s sure, and witness it so anyone can re-run the proof and get the same result. So the real answer is a list of places where security stops being something you take on trust and becomes something you can check. Aleph is the sharpest end of it; here is the rest of the reach.

AI red-team & evaluation

Make a model tougher before it ships — steady pressure where every probe is kept inside its allowed boundary and witnessed, so the test runs again instead of living in a chat log.

Authorized penetration testing

Run the job so the scope, the actions, and the findings are all bounded and re-runnable — a client gets a report they can check and reproduce, not one they take on trust.

Supply-chain & release integrity

Check what you’re about to ship for secrets, and check where it came from — catch the leak or the swapped-in part before it’s public, and keep the receipt that says you did.

Tamper-evidence & integrity

Prove a file, a log, a model, or a drawn frame hasn’t drifted from what it was — worked out again from the thing itself, answered MATCH / DRIFT / UNVERIFIABLE, never just vouched-for.

Agentic-AI security

Give an AI agent only the power it needs, only for as long as it needs it, off by default — plus a witnessed record of every action it actually took. Knowing about something is not permission to do it, and nothing it does is off the record.

Pre-disclosure & handoff review

Before you publish, hand off, or sign off that it’s “done and safe,” get a receipt of what’s actually there, a flag on anything shaped like a secret, and a hard stop at the human sign-off.

Audit, compliance & incident evidence

Turn security work into bounded records, re-runnable for the same result, that an auditor or incident reviewer can check from scratch — a chain of evidence that holds up, instead of a story they have to trust.

Software & runtime integrity

Spot tampering, evasion, and manipulation by understanding exactly how they’re done — the anti-cheat line of work, turned inward so the accountability platform is held to account by being attacked on purpose as it is built.

C++23 integrity framework · private

Not more power — security you can re-run.

Underneath all eight it is one engine, pointed at different jobs: the depth that finds the problem, kept on a leash that proves it stayed inside its limits. That is what the depth is for here — not a bigger hammer, but security whose every swing is witnessed and bounded. The attacking skill is real; the accountability is what makes it safe to sell, and safe to buy.

What it is for, and what it is built to refuse.

A tool that finds how a model fails can be used both ways, and I won’t pretend otherwise — pretending is how these things go wrong. So it is built to face one way on purpose. It is pointed at making your own models tougher: the output is the evidence a team needs to make a system safer, under a permission that names what is allowed. It is held inside a permission it cannot widen, and witnessed so that nothing it does is off the record — including, on purpose, to hold me to that line as much as anyone using it.

Built to be used well, and built to resist the opposite — the limit lives in how it’s made, not in a promise from me.

That is the same line the rest of this site draws, drawn hardest here because here it counts most. The depth that lets it find a failure is real; it is contained by how it is built, kept to the defensive use it was made for, and I would walk it away from anything else before I let it become the thing it was built to guard against.

Who it’s for — and the next step.

Aleph is for teams making their own models tougher: labs and organisations that need this kind of testing in a form they can actually check, re-run, and trust — and that would rather catch a failure in a test than in the wild. It is offered as a capability, through a conversation, not as a public download; what it does is too sharp to leave lying around, and the honest way to share it is under terms, with a real person on the other side.

If that is you — or if you just want to understand it before deciding it’s real — the next step is a conversation, and I’ll show you what a page can’t.

Inquirieszaindharper@gmail.com  ·  ← the map  ·  the index