Skip to content

feat(rescue): --from <git-dir> reconstructs evicted packs from local clone#55

Merged
sanity merged 5 commits into
mainfrom
rescue-from-local
May 17, 2026
Merged

feat(rescue): --from <git-dir> reconstructs evicted packs from local clone#55
sanity merged 5 commits into
mainfrom
rescue-from-local

Conversation

@sanity
Copy link
Copy Markdown
Contributor

@sanity sanity commented May 17, 2026

Problem

When a bundle's pack has been evicted from the gateway AND from any peer reachable via the ring, the rescue would fail with `GET pack ... 1 bundle(s) failed to rescue` with no recovery path. The rescue could only re-PUT what the gateway still had cached, so a single evicted pack meant the contract was permanently degraded.

Concrete incident (2026-05-14 through 2026-05-17): bundle `2345ec0f` on `freenet:99TmCayXn6Tm/freenet-git` (the self-mirror contract) had its underlying pack `7c30464721` evicted during normal LRU pressure on the gateway. Every rescue run since failed for this bundle while the other 26 bundles in the same run succeeded — visible as recurring Matrix `rescue-demos` failure alerts every 12h.

This closes the documented future-version gap in `main.rs:135-137` and `rescue-demos.yml:7-13`.

Verified before implementing

`git pack-objects` produces byte-for-byte identical output for the same object set + git version, so a local clone of the upstream repo can rebuild the exact bytes the original push uploaded. Tested against the actual failing bundle: `git pack-objects` on the `(000c4fd, c682079]` symmetric difference produced BLAKE3 `7c30464721...`, matching the contract's expected hash byte-for-byte.

Solution

New `local_pack` module (`crates/freenet-git/src/local_pack.rs`):

  • Reads `bundle-tip:` extensions from the contract state to learn each bundle's tip commit.
  • Sorts tips chronologically by commit timestamp (`git log -1 --format=%ct`).
  • For each consecutive `(prev_tip, new_tip)` pair, runs `git pack-objects` on the symmetric difference and BLAKE3s the output, building a `pack_hash -> pack_bytes` lookup map.

In `rescue_pack` (main.rs): on `wsclient::get_pack` failure, looks up the expected pack hash in the local-reconstruction map; if present, PUTs those bytes. If `--from` wasn't passed the map is empty, so the lookup is a cheap no-op for the pre-0.1.23 invocation shape.

Scope (initial implementation):

  • History mode only: snapshot-mode mirrors force-push fresh orphan commits per run, so the relationship between bundle tips and (prev, new) ranges is degenerate.
  • SinglePack only: ChunkedPack reconstruction requires re-splitting the local pack into chunks of the same `chunk_size` that the original publish used; `chunk_size` isn't stored in contract metadata today.

Both excluded cases fall through to the existing GET-only path — same as the pre-0.1.23 behavior.

Workflow update

`rescue-demos.yml` now clones the upstream GitHub repo for history-mode cells (`freenet-stdlib`, `freenet-git` self-mirror) via `actions/checkout@v4` and passes `--from upstream/.git` to the rescue. Snapshot-mode cells (`freenet-core`) skip the clone.

CLI-flag detection (`grep -q -- '--from'`) keeps the workflow backwards-compatible with pre-0.1.23 freenet-git releases.

Testing

`local_pack` module tests verify SinglePack/ChunkedPack distinction and the empty-state path (no bundle-tip extensions = no map entries, not an error).

The byte-for-byte reproducibility claim was verified by hand against the live failing bundle before implementing (see "Verified before implementing" above) and will be re-verified end-to-end when the next scheduled rescue run consumes the 0.1.23 release.

Release

Version bump `0.1.22 → 0.1.23`. After merge: tag + publish all 6 freenet-git crates so the rescue-demos workflow's `cargo install freenet-git --locked` picks up the new binary automatically.

Fixes

No issue filed — discovered during the rescue-demos investigation referenced in freenet/freenet-stdlib#76 and freenet-core#4133.

[AI-assisted - Claude]

sanity added 2 commits May 17, 2026 15:45
…clone

Closes the documented future-version gap (main.rs:135-137,
rescue-demos.yml:7-13): when a bundle's pack has been evicted from
the gateway AND from any peer reachable via the ring, the rescue
would fail with "GET pack ... 1 bundle(s) failed to rescue" and
there was no recovery path -- the rescue could only re-PUT what the
gateway still had cached.

Concrete incident (2026-05-14 through 2026-05-17): bundle 2345ec0f
on freenet:99TmCayXn6Tm/freenet-git (the self-mirror contract) had
its underlying pack 7c30464721 evicted during normal LRU pressure
on the gateway. Every rescue run since failed for this bundle while
the other 26 bundles succeeded -- visible as recurring Matrix
rescue-demos alerts.

Verified empirically before implementing: git pack-objects produces
byte-for-byte identical output for the same object set + git version,
so a local clone of the upstream repo can rebuild the exact bytes the
original push uploaded. Tested against the actual failing bundle:
`git pack-objects` on the (000c4fd, c682079] symmetric difference
produced BLAKE3 7c30464721... matching the contract's expected hash.

# Implementation

New `local_pack` module (`crates/freenet-git/src/local_pack.rs`):
- Reads `bundle-tip:<id>` extensions from the contract state to learn
  each bundle's tip commit.
- Sorts tips chronologically by commit timestamp.
- For each consecutive (prev_tip, new_tip) pair, runs
  `git pack-objects` on the symmetric difference and BLAKE3s the
  output, building a `pack_hash -> pack_bytes` lookup map.
- Scoped to SinglePack + history mode. ChunkedPack reconstruction
  needs the original chunk_size (not in contract metadata); snapshot
  mode's force-pushed orphan commits don't have meaningful (prev, new)
  ranges. Both fall through to the existing GET-only path; this is
  the same as the pre-0.1.23 behavior for those cases.

In `rescue_pack` (main.rs): on `wsclient::get_pack` failure, look up
the expected pack hash in the local-reconstruction map; if present,
PUT those bytes. If `--from` wasn't passed the map is empty, so the
lookup is a cheap no-op for the legacy invocation shape.

# Workflow update

`rescue-demos.yml` now clones the upstream GitHub repo for history-
mode cells (`freenet-stdlib`, `freenet-git` self-mirror) via
actions/checkout@v4 and passes `--from upstream/.git` to the rescue.
Snapshot-mode cells (`freenet-core`) skip the clone since `--from`
can't help with their content.

The CLI flag detection (`grep -q -- '--from'`) keeps the workflow
backwards-compatible with pre-0.1.23 freenet-git releases: the
clone is fetched but `--from` is silently omitted from the rescue
invocation. Once 0.1.23+ is the published floor on crates.io, the
probe collapses to an unconditional pass.

# Tests

`local_pack` module tests verify SinglePack/ChunkedPack distinction
and the empty-state path (no bundle-tip extensions = no map entries,
not an error). The byte-for-byte reproducibility claim is verified
in the commit message above against the live failing bundle and
will be re-verified when the next scheduled rescue run consumes the
0.1.23 release.

# Release

Version bump 0.1.22 -> 0.1.23. Publish after merge so the
rescue-demos workflow's `cargo install freenet-git --locked` picks
up the new binary automatically.

[AI-assisted - Claude]
Reviewers identified three real correctness issues in the initial
--from implementation; all addressed in this commit.

# Skeptical H1: git pack-objects is non-deterministic with default threads

`git pack-objects` default `pack.threads=auto` (= CPU count) races
during delta search, producing different pack bytes for the same
object set across runs / machines with different CPU topologies. The
PR-#55 initial verification on my dev box was coincidence -- a CI
runner with a different core count would produce different deltas,
different BLAKE3, and silent 0-pack reconstruction.

Fix: pass `-c pack.threads=1` to `git pack-objects` in
`build_pack_for_range`. Verified empirically against the live failing
bundle: `git -c pack.threads=1 rev-list ... | git -c pack.threads=1
pack-objects --stdout` for `(000c4fd, c682079]` still produces
BLAKE3 `7c30464721...`. The single-threaded delta search trades
wall-clock for bit-for-bit reproducibility; rescue runs aren't a hot
path, so correctness > speed here.

The original `build_pack` in git-remote-freenet.rs intentionally
keeps default threads -- push paths create FRESH bundles whose
pack_hash is whatever the run produced, so non-determinism doesn't
matter there.

# Skeptical H2 / Codex P2 #1: committer-date ordering can mis-pair

Sorting bundle tips by `%ct` doesn't reflect the actual push order:
merge commits, imported history, and concurrent CI commits in the
same second can all produce orderings that don't match the push
sequence. Mis-paired `(prev_tip, new_tip)` ranges generate packs
with different content and different BLAKE3, missing the map and
silently degrading rescue success rate.

Fix: replace `commit_timestamp` with `ancestor_count` (`git rev-list
--count <tip>` returns reachable-commit count, which strictly grows
with each push in a linear history-mode chain). Tie-break by tip
bytes for determinism. Removes `git log -1 --format=%ct` invocation.

# Codex P2 #2: --from worktree-root paths failed silently

`git --git-dir <path>` does NOT auto-append `.git` -- it treats
`<path>` as the literal git directory. The CLI help text described
"pass the path to a working git directory of the same repo (the
`.git` of a clone is fine; so is a bare repo)", but a user passing
the worktree root (the documented "clone of the same repo" form)
would silently get a 0-pack map.

Fix: new `normalize_git_dir` helper detects `<path>/.git` and
returns it if present; passes through otherwise (handles bare repos
and explicit `.git` paths). Applied at the top of
`build_local_pack_map`.

# Skeptical M3: HashMap<[u8;32], Vec<u8>> deep-cloned per PUT

Changed map value type to `Arc<Vec<u8>>` so parallel rescues share
the same byte buffer instead of N deep clones. The PUT still needs
an owned `Vec<u8>` for the `wsclient::put_pack` API so one clone
remains per PUT — but the up-front cost is now O(1) Arc clone
instead of O(pack_size) deep clone for the per-task dispatch.

# Skeptical L1: dead `expected_pack_hash` removed

The helper was exported `pub fn` but never called -- `rescue_pack`
receives `pack_hash: [u8; 32]` directly. Removed it and its now-dead
test. Replaced with two `normalize_git_dir` tests covering the
worktree-root and bare-repo paths (Codex P2 #2 regression guard).

# Other findings deferred to follow-ups (not blocking this PR)

- Skeptical M4 (serial build-up-front cost): fine for current bundle
  counts (~30); revisit if multi-thousand-bundle contracts emerge.
- Skeptical M5 (unbounded fetch-depth in workflow): documented in-PR;
  worth a follow-up issue but not regressive vs the previous workflow.
- Skeptical L7 (rev-list buffered to memory): same pattern as original
  build_pack; not introduced here.

[AI-assisted - Claude]
@sanity
Copy link
Copy Markdown
Contributor Author

sanity commented May 17, 2026

Multi-model review found three real issues + two cleanups; all addressed in 3d8f306:

  • Skeptical H1 (resolved): `git pack-objects` is non-deterministic with default `pack.threads=auto`. Added `-c pack.threads=1` to `build_pack_for_range`. Empirically verified the live failing bundle's pack `7c30464721...` still reproduces bit-for-bit on `(000c4fd, c682079]` with `--threads=1`. The original `build_pack` in git-remote-freenet.rs intentionally keeps default threads — push paths create fresh bundles, non-determinism doesn't matter there.

  • Skeptical H2 / Codex P2 Wire up WS API publish in freenet-git create #1 (resolved): Sorted by ancestry (`git rev-list --count `) instead of committer-date `%ct`. Tie-breaks on tip bytes for determinism. Mis-paired pushes from merge commits / out-of-order timestamps can no longer silently miss the map.

  • Codex P2 Implement git-remote-freenet helper binary #2 (resolved): Added `normalize_git_dir` helper. If `/.git` exists, use that; else pass through. Handles the documented "pass a clone root" form that previously failed silently because `git --git-dir` doesn't auto-append `.git`.

  • Skeptical M3 (resolved): Map value is now `Arc<Vec>`. Parallel rescues share the same buffer instead of deep-cloning N times during dispatch. The PUT-time clone is unavoidable until `wsclient::put_pack` is refactored to take `Arc`.

  • Skeptical L1 (resolved): Removed dead `expected_pack_hash` helper and its test. Added two new `normalize_git_dir` tests as regression guards for the Codex P2 Implement git-remote-freenet helper binary #2 fix.

Other findings deferred with rationale:

  • M4 (serial build-up-front cost): fine for current bundle counts (~30); revisit at multi-thousand scale.
  • M5 (unbounded fetch-depth in workflow): not regressive; worth a follow-up issue.
  • L7 (rev-list buffered to memory): same pattern as original `build_pack`.

All 3 `local_pack` tests pass; clippy clean.

[AI-assisted - Claude]

Codex re-review on PR #55 @ 3d8f306 found two more real issues:

# Codex P2 #1: linear ancestry chain miss-pairs multi-refspec pushes

The rescue-demos workflow pushes `main:main` + `refs/tags/*:refs/tags/*`
in a single git push. The remote helper handles each refspec
independently in handle_push (git-remote-freenet.rs:868-922), so each
ref gets its own bundle whose `prev` comes from
`state.refs.get(&dst)`:

  - main:main with prev = state.refs["refs/heads/main"].target
    → pack covers `prev_main..main_tip`
  - refs/tags/v0.1.X (new tag, never pushed before): prev = None
    → pack covers everything reachable from the tag

The previous algorithm chained EVERY tip linearly through one global
ancestry order, so a tag bundle would be reconstructed as
`previous_bundle_tip..tag_commit` rather than the original
`(.., tag_commit]`. Different content → different BLAKE3 → bundle
silently misses the map.

Fix: for each tip, build BOTH candidates — chained-from-previous AND
no-prev. Insert each into the map under its own BLAKE3. The actual
original pack matches one of them; wrong-content candidates land
under their (mismatched) hash and are never looked up. Cost: ~2N
pack-objects invocations instead of N. Acceptable for current bundle
counts; revisit if multi-thousand-bundle contracts emerge.

# Codex P2 #2: pack.threads=1 must apply to publisher too

PR #55's first review fix pinned `pack.threads=1` only in the rescue's
build_pack_for_range. But the publisher (build_pack in
git-remote-freenet.rs) still ran default `pack.threads=auto`, which is
non-deterministic on multi-core machines. A multi-core publisher could
produce pack bytes that single-threaded reconstruction can't
reproduce → expected hash mismatch → silent rescue failure.

The 2026-05-17 verification against bundle 7c30464721 passed because
the pack was small enough (17 KB) that delta search probably ran
single-threaded anyway; a larger pack on a multi-core publisher would
have exposed this.

Fix: add `-c pack.threads=1` to build_pack in git-remote-freenet.rs
too. Small per-push wall-clock cost for permanent reproducibility
across all future rescues. Note: only matters for history-mode
pushes; snapshot-mode pushes create fresh bundles per run, so their
pack determinism doesn't affect anything (we don't reconstruct
snapshot bundles either).

# Tests

Both fixes are inherently empirical. The structural pin remains
correct; the runtime verification will happen when 0.1.23 ships and
the next rescue run picks up the new binary against contracts whose
packs were produced by pre-0.1.23 publishers. The 2026-05-17
verification (bundle 7c30464721 reproduces from `(000c4fd,
c682079]`) holds for at least the SinglePack chained-from-previous
case used by the freenet-stdlib mirror; tag bundles get verified
end-to-end on the next freenet-git self-mirror push that creates one.

Module-level algorithm doc updated to reflect the new multi-candidate
shape.

[AI-assisted - Claude]
@sanity
Copy link
Copy Markdown
Contributor Author

sanity commented May 17, 2026

Codex re-review found two more real issues; both fixed in b4881a0:

  • Codex P2 Wire up WS API publish in freenet-git create #1 (resolved): linear ancestry chain mis-pairs multi-refspec pushes. The rescue-demos workflow pushes `main:main` + `refs/tags/:refs/tags/` in one git push; each ref gets its own bundle with its own `prev` from `state.refs.get(&dst)`. New tags use `prev = None` (everything reachable from the tag), branch pushes use `prev = branch's_last_tip`. The linear chain through global tip order can't represent this.

    Fix: for each tip, build BOTH candidates — chained-from-previous AND no-prev. Insert each under its own BLAKE3. Wrong-content candidates land under their (mismatched) hash and are silently never looked up. Cost: ~2N pack-objects calls instead of N.

  • Codex P2 Implement git-remote-freenet helper binary #2 (resolved): `pack.threads=1` must apply to the PUBLISHER too. The first review fix only pinned it on the rescue side. A multi-core publisher running default `pack.threads=auto` could produce pack bytes that single-threaded reconstruction can't reproduce. The 2026-05-17 empirical verification on bundle `7c30464721` passed only because the pack was small (17 KB) — delta search probably ran single-threaded anyway.

    Fix: added `-c pack.threads=1` to `build_pack` in git-remote-freenet.rs too. Small per-push wall-clock cost for permanent rescue reproducibility across all future bundles.

Skeptical re-review on the previous commit (3d8f306) said "ship it" with all 5 prior findings closed; these new fixes are orthogonal to those. CI was green on 3d8f306 (mergeState CLEAN); waiting on fresh CI for b4881a0.

[AI-assisted - Claude]

sanity added 2 commits May 17, 2026 15:59
Skeptical re-review @ b4881a0 caught: `try_reconstruct_into`'s
docstring said "logged at info-level" but the implementation uses
`tracing::debug!`. The previous commit lowered the log level
deliberately (one candidate failing per tip is now normal under the
multi-candidate algorithm — surfacing those at info would be noisy)
but didn't update the prose. Fix the docstring to match the code and
explain the rationale.

[AI-assisted - Claude]
…-continuation

CI clippy (Rust 1.94.0) tripped on `clippy::doc_lazy_continuation` for
the module-level numbered list: a line starting with `//!    +
\`refs/tags/*...\`` was parsed by the markdown lint as starting a
nested sub-list, then the following continuation line had no list
marker. Reworded to use "both A and B" prose rather than a stray "+"
that looks like a list marker.

[AI-assisted - Claude]
@sanity sanity merged commit 61d5fea into main May 17, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant