This blind spot is built in, not a small bug to patch.
The blind spot isn’t a slip-up — an AI with no way to look simply has no line to the real world. It pictures the world in its head instead. This tool doesn’t try to fix that; it gives the AI the missing line. It’s a small kit of readers, each one built for a single kind of thing. A reader opens a file (a PNG image, a screen frame, a JSON document, a WAV sound, a line of captions) and hands back a plain record of what it actually saw. That record holds the file’s exact fingerprint, a few measurements that fit that kind of file, and, where it makes sense, the second “how it looks” fingerprint. That record is the evidence. The AI leans on it, not on a picture in its head — and the record is yours to re-check too.
Here’s the part I find quietly satisfying: keeping it safe and making it more capable turn out to be the same thing. An AI that can see what’s truly there can’t act on a daydream about what it imagines is there — that’s the safety. And it can finally finish the loop it was always working blind in — make something, look at it, compare it to what you wanted, fix it — that’s the capability.
These readers only look and tell you what they saw. They never change the thing they’re looking at, or how it was made, or anything else.The readers, one per kind of thing.
There’s a separate reader for each kind of file, because a sound and a screenshot need to be looked at differently. Here is what each one does.
VisualArtifactOrgan — looks at a PNG image and writes down its exact fingerprint (the SHA-256), the width and height it actually measured, and a small “how it looks” fingerprint built from the picture’s pixels. That second fingerprint is what lets the tool sense change gently: instead of only asking “did anything change?”, it can tell roughly how much the image moved by counting how many spots the two fingerprints differ in.
RawScreenCaptureSource / RawFrameOrgan — takes a picture of the screen the same way the operating system already draws it (on Windows, through its built-in screen tools via ctypes — nothing extra installed). To stay quick, it skips turning each frame into a PNG: every moment it takes the exact fingerprint straight from the raw screen data, and only works out the slower “how it looks” fingerprint when something genuinely changed. It gives the exact same answer as the slower PNG path — the tool proves that to itself with a built-in self-check — just for less work.
RegionArtifactOrgan — the same whole-image facts, but it also cuts the picture into a grid of tiles and gives each tile its own little “how it looks” fingerprint. So compare_region_drift can point to exactly which tiles moved — “tile 5 changed” instead of the much vaguer “the whole screen changed.”
StructuredDataOrgan — looks at a JSON document (a common, tidy way of storing data) and takes two fingerprints of it. One is of the file exactly as written. The other first rewrites the document into a single neat, standard form — same keys in the same order, spacing cleaned up — and then fingerprints that. So it can ask, in order: are these byte-for-byte the same? if not, are they the same once tidied? and if not, how far apart are they? The upshot is kind and exact at once: a document that was only reformatted reads as a MATCH, while a document where a real value changed reads as a true DRIFT.
AudioArtifactOrgan — listens to a WAV sound file (using the wave reader that already ships with Python, no outside audio software) and writes down its exact fingerprint, its format, and a small “how it sounds” fingerprint that follows how the loudness rises and falls over time. If it ever meets a sound it can’t make sense of, it doesn’t guess — it quietly falls back to the exact fingerprint alone.
CaptionOrgan — looks at a line of subtitles or a line of transcript and takes two fingerprints: one of the text exactly as written, and one after gently tidying it (smoothing out the ways the same characters can be stored, and squeezing any run of spaces down to one). So a caption that only differs by extra spaces still reads as a MATCH. Paired up moment by moment with a frame reader, the tool can record what was on the screen and what was being said as a single instant.
Two gates, kept apart on purpose.
Think of two doors with two different jobs. The read-gate (this library) answers one question: what is actually there? It looks at real things and turns them into Observations — honest records of what it saw. The write-gate (proof-surface, pre_execution_gate) answers a different one: given what’s there, should this action be allowed to happen? It says no unless told otherwise, stays shut whenever anything is unclear, and offers advice rather than forcing anyone’s hand.
They live in separate projects on purpose. A read-gate is useful even to something that only watches and never acts; a write-gate is useful even to something that acts but has no way to see. They fit together not by sharing code, but by sharing the same simple shape of record that passes between them. The looking tool’s MATCH / DRIFT / UNVERIFIABLE is the very same set of answers the gate already speaks, so what this tool sees flows straight into the gate’s decision with nothing in between to translate.
This tool never hands out permission and never removes a single check — every real guarantee lives in the pieces it’s joined to.Don’t take the record on my word. Watch it refuse to guess.
I care just as much about what it does when it’s stuck as about what it does when things go well — the moment it can’t check is exactly the moment it has to stay honest. The cases below show two things: what happens when it can’t verify, and what happens when two separate versions, working on their own, arrive at the same answer.
# perceive a frame — witnessed identity, dimensions, perceptual hash >>> snap = perceive(["frame.png"]) >>> obs = snap.observations[0] obs.data["identity_sha256"] # full-width, re-derivable obs.data["width"], obs.data["height"] obs.data["perceptual_hash"] # 64-bit dHash # drift check — closed lattice, never a silent match on difference >>> compare_drift(baseline_sha, current_sha, baseline_phash, current_phash) DriftVerdict(verdict="MATCH") # byte-identical >>> compare_drift(baseline_sha, changed_sha, baseline_phash, changed_phash) DriftVerdict(verdict="DRIFT", distance=12) # no pinned anchor — honest refusal, never a fabricated VALID >>> verify_receipt(receipt) ReceiptVerdict(verdict="UNVERIFIABLE", reason="no pinned anchor") $ pytest 868 passed in current suite
The tool looks, then tells you. A match is a match it actually saw. With nothing saved to compare against, it says UNVERIFIABLE — it never quietly waves the check through. And the set of answers is sealed: nothing outside MATCH / DRIFT / UNVERIFIABLE can ever come out.
Captured live from pytest — 868 passed (current suite, includes the graph plane and Hasse-guard increment); stdlib-only, zero dependencies, github.com/HarperZ9/coherence-membrane. A separate check in lattice.py walks through every possible combination by hand and confirms three things hold without exception: only those three answers can appear, an unclear case always lands on the safe “can’t tell,” and combining results can only ever make confidence weaker, never stronger — and it reruns on every test pass. Honest limit: the organs only look, never act, and a drift answer is only as trustworthy as the saved version it’s measured against. When in doubt it returns UNVERIFIABLE instead of guessing a match. That “how it looks” fingerprint is deliberately rough — a helpful hint, not real understanding of what the thing means.
# Python reference — re-derives every case in the frozen conformance corpus $ python conformance/run.py all cases pass # corpus hash pinned # JavaScript implementation — shares no code with the Python reference # re-derives SHA-256, PNG-decode + dHash, drift lattice, canonical-JSON, # region drift, and receipt anchor — value-for-value $ node impl/js/run.js {"impl":"js","cases":16,"passed":16,"failed":0} # every organ proves itself — an unverified membrane is worse than none $ PYTHONPATH=src python -m coherence_membrane selftest {"passed": true, "results": [ ...6 organs, every check ok... ]}
Two versions that share no code, landing on the very same answers across a fixed, locked set of test cases — you can watch it line up, not just take my word that it does. And a self-check that’s allowed to fail, run before any trust is given.
JavaScript implementation in impl/js/membrane.js (Node built-ins only — crypto, zlib); conformance corpus in conformance/vectors.json, locked by its own fingerprint so it can’t be quietly edited; the agreed shapes for the Observation and DriftVerdict records in schemas/. Honest limit: the two versions are shown to agree across this set of cases — that’s not the same as proving they’ll always agree on everything. When the JavaScript side meets a number too large to handle exactly, it stops and says so out loud rather than quietly drifting from the Python side — a deliberate fail-loud edge, written down in the README. The screen-capture parts for macOS and Linux are written to those systems’ rules but haven’t been tried there yet; I only have Windows to test on.
What it is, and what it isn’t.
Here is exactly where the line sits, so you never have to guess at it. The tool only looks — it’s built from plain Python, leans on nothing outside it, carries 868 tests, and each reader checks itself in ways that are allowed to fail. It swaps a guess for a real look; it does not swap in for a gate. Stopping or allowing things happens at the write-gate (proof-surface). The tool reports what it saw; the person in charge decides what to do about it.
Taking a picture of the screen only reads what’s already shown on the display in front of you, using the operating system’s own screen tools through ctypes. It does not reach inside, latch onto, or read the private memory of another program. It’s meant to be used only on things you own or have been given permission to watch. That isn’t fine print — it’s how the thing was built on purpose.
Left to itself, the answer is never “yes” — and that’s checked against the real code, not just promised in a comment.
The check in lattice.py is done by computer, walking through every single possibility one by one. For only three answers, the underlying rules are easy to state — and that’s on purpose. The real work isn’t the rules; it’s tying the actual code to them, with runnable proofs that the real functions behave exactly the way those rules demand — landing on the right answer, and combining answers in the right way.
A note on where this came from, in case it helps: I didn’t set out to build a way for a machine to see. I set out to stop trusting my own eyes about what was in front of me — the file I was sure I’d changed, the screen I was sure had drawn — and an AI turned out to share the very same blindness, only deeper, and far more sure of itself. So the habit of checking came first, and the tool grew out of it. If you want the longer argument for why this gap matters, it’s its own page →.