Part II · Chapters VIII–XIII

Validation

800 photographs, three attack classes, and an honest reckoning

VIII

Block Interior

The universe inside an 8×8 grid

A deep dive into DCT block structure revealed why some marker positions were more stable than others. The encoder chops the image into 8×8 blocks. Sixty-four pixels per block. Each block is processed independently — the encoder literally does not know that adjacent blocks exist.

You must chunk them. The chunk boundary creates independence. The chunk interior creates correlation.

The center positions of each block are dominated by the DC coefficient — the most stable, most heavily preserved component. Edge positions are dominated by high-frequency AC coefficients — the first thing quantization throws away. The optimal injection targets are in the interior, where the codec fights hardest to preserve fidelity.

Cross-Codec Survival

“You have got to be kidding me.”

The expectation was that transcoding from JPEG to WebP would destroy the signal. Different codec. Different transform. Different quantization scheme. There was no reason it should survive.

It survived.

This validated the universality thesis. The signal doesn’t depend on JPEG’s specific DCT implementation. It depends on the fact that any lossy codec must partition, transform, and quantize. The structural difference between marker and non-marker positions is codec-agnostic because it’s a property of the data, not of the compression scheme.

Initial testing on MP3 audio confirmed the same principle. The MDCT frames used in audio compression create the same exploitable boundary structure. As the generalized detection heuristic emerged:

Any time I see the word “Assume,” any time I see the word “Window” (which assumes boundaries), and any time I see that the same data lives in N+1 places — we have injection targets.

Or as it was more memorably put: I-frames are JPEGs in a trench coat.

The Granite Test

800 photographs. The moment of truth.

Everything before this was synthetic images and controlled experiments. The DIV2K dataset — 800 real photographs of every conceivable subject matter — was the real test. Real cameras. Real noise. Real content diversity.

There were bugs. Module import errors. API mismatches. Type errors between bytes and tuples. Each one fixed, re-run, fixed again. And then:

I don’t know what it’s doing but it’s doing something.

The results came in:

G-B Detection96.4%

R-G Detection90.1%

Either Channel99.6%

Amplification Confirmed48.1%

797 out of 800 images. After four generations of JPEG compression. The signal getting louder, not quieter. On real photographs that the system had never seen.

Verdict: GRANITE PARTIAL. The effect is real but content-class dependent. Some image types amplify more than others. The paper can claim the effect — with caveats.

The Torture Tests

We tried to kill it

With detection validated, the next question: what destroys it? Three attack simulations designed to stress every assumption:

The Rotation Attack

Rotate the image. Flip it. Rotate again. Flip again. Re-encode as JPEG. After a full chain of geometric transforms plus lossy compression, with 97.5% of pixels changed from the original: still detected. 4.0x ratio. p=10^-51. Fingerprint Jaccard: 0.72.

The Slice-and-Stitch Attack

Cut the image into four quadrants. Save each one independently as JPEG. Stitch them back together. Detection breaks on the fragments — too few markers per piece. But reassemble them, and the signal comes back. The fingerprint comes back. Even the stitch seams are forensically detectable.

The attack undoes itself on reassembly. The only winning move is to keep the pieces separate. And separate pieces are a crop attack, not a stitch attack.

The Scale Attack

Resize from 2048px down to 1024, 512, even 256px. The signal survived at every scale. Cross-codec transcoding after resize: still detected. Fingerprint Jaccard above 0.47 across every transform tested.

The variance ratio at 2048px told the amplification story best: starting at 3.7 at Gen 0, climbing to 9.3 by Gen 4. The harder you compress, the louder it gets.

XII

Dead Ends

The ideas that didn’t survive

Not everything worked. Some ideas were abandoned. Some were killed on arrival. Each one taught something.

Basket as Identity

The idea that each creator could use a unique prime basket as their identifier. Collisions appeared after only ~50 images. “A collision after 50? Non-starter.” Replaced by HMAC-SHA512 position patterns with 10⁴⁰⁰ possible configurations.

Blockchain Attribution

Ethereum with Merkle trees for identity anchoring. Rejected as contrary to the project’s decentralized ethos. The system that claims no authority cannot depend on one.

DC-Anchored Embedding

Anchoring markers to DC coefficients for stability. Better performance, but too predictable. It solved a problem that didn’t exist while making the adversary’s job easier.

Browser Extension Detection

Dismissed without ceremony. Not even wrong.

XIII

Shoulders of Giants

An honest assessment

Near the end, Jeremy asked for the one thing most people don’t actually want: an honest answer.

Please don’t be complimentary or give fake praise. The best birthday gift is an honest assessment. Be brutal, because most of the time, for any and everyone, the answer is No. Did we just discover something new?

His own conclusion was characteristically grounded:

I am going to assume that each individual piece is probably known to someone at some time, but they may not have grokked how it can work as a cog to solve a larger problem. Like teeth in a gear. The idea of teeth isn’t new at all. Variable gears with differential teeth strategies are, though.

A novel systems design that may or may not have new novel integrated components.

Shoulders of giants, man. Shoulders of giants.

I wonder if this is what Satoshi felt before he hit publish on his seminal paper.

← Part I: Discovery

← Back to History