fix(payment): dedupe merkle candidate PeerIds + clearer sparse-network error#80
Conversation
…k error Two follow-ups to the pay-yourself closeness check (PR WithAutonomi#77, landed in v0.11.0-rc.1) flagged by Greptile/Copilot reviews but not blocking the RC: 1. Reject pools with duplicate candidate PeerIds outright. Counting `matched` against the network-returned set with no dedup meant an attacker could submit 16 copies of the same real peer's pub_key and trivially clear the 13/16 closeness threshold (one peer counted 16 times). An honest pool has CANDIDATES_PER_POOL distinct pub_keys by construction. Rejecting duplicates outright is cleaner than counting a HashSet of candidate PeerIds — same security guarantee, plus the rejection message tells the operator why. Factored into a small helper `derive_distinct_candidate_peer_ids` and called as a pre-check before the Kademlia lookup so the cost is paid only on malformed/hostile pools, not honest ones. 2. Distinguish "sparse network" from "forged pool" in the rejection error. When `find_closest_nodes_network` returned fewer peers than the 13/16 threshold, the previous error read "candidate pub_keys do not match the network's closest peers" — misleading: the cause was a sparse routing table, not a forged pool. Added an explicit short-circuit branch with a "retry once the routing table populates further" message so operators can tell transient network conditions apart from an actual attack signal. Tests: 1 new unit test (`closeness_rejects_pool_with_duplicate_candidate_pub_keys`) + existing 43 unit + 8 e2e merkle tests still pass. Clippy + fmt + doc clean.
There was a problem hiding this comment.
Pull request overview
Hardens merkle payment verification by preventing duplicate-candidate attacks in the DHT closeness check and by distinguishing “sparse network” lookup results from genuinely forged pools, improving operator-facing diagnostics.
Changes:
- Add a pre-validation step that derives candidate
PeerIds and rejects pools containing duplicatePeerIds. - Add an explicit “network too sparse to verify closeness” rejection branch when the DHT returns fewer peers than the closeness threshold.
- Add a unit test covering the duplicate-candidate
pub_key/PeerIdattack.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return Err(Error::Payment( | ||
| "Merkle candidate pool rejected: duplicate candidate PeerId. An \ | ||
| honest pool has 16 distinct candidate pub_keys; duplicates would \ | ||
| let a single real peer satisfy the closeness threshold by being \ | ||
| counted multiple times." | ||
| .into(), | ||
| )); |
There was a problem hiding this comment.
The duplicate-candidate rejection message hard-codes "16 distinct candidate pub_keys", but elsewhere this function/documentation refers to evmlib::merkle_payments::CANDIDATES_PER_POOL. To avoid the error text becoming stale if the constant ever changes, consider formatting the message using that constant (or pool.candidate_nodes.len() if that is the enforced size) instead of a literal 16.
| return Err(Error::Payment( | |
| "Merkle candidate pool rejected: duplicate candidate PeerId. An \ | |
| honest pool has 16 distinct candidate pub_keys; duplicates would \ | |
| let a single real peer satisfy the closeness threshold by being \ | |
| counted multiple times." | |
| .into(), | |
| )); | |
| return Err(Error::Payment(format!( | |
| "Merkle candidate pool rejected: duplicate candidate PeerId. An \ | |
| honest pool has {} distinct candidate pub_keys; duplicates would \ | |
| let a single real peer satisfy the closeness threshold by being \ | |
| counted multiple times.", | |
| evmlib::merkle_payments::CANDIDATES_PER_POOL | |
| ))); |
| if network_peers.len() < Self::CANDIDATE_CLOSENESS_REQUIRED { | ||
| debug!( | ||
| "Merkle closeness deferred: network lookup returned {} peers \ | ||
| for pool midpoint {} (need at least {} to verify)", | ||
| network_peers.len(), | ||
| hex::encode(pool_address.0), | ||
| Self::CANDIDATE_CLOSENESS_REQUIRED, | ||
| ); | ||
| return Err(Error::Payment(format!( |
There was a problem hiding this comment.
The debug log says "Merkle closeness deferred" but this branch returns Err (rejecting the proof). For operator clarity, consider aligning the wording (e.g., log "rejected (network too sparse)" or similar) so logs don’t contradict the returned error path.
1. Replace candidates[0] direct indexing with .first().expect(...) in
the new closeness_rejects_pool_with_duplicate_candidate_pub_keys
unit test. Per CLAUDE.md memory: avoid direct indexing even in tests.
2. Tighten test_attack_merkle_pay_yourself_fabricated_pool to actually
exercise the matched-count rejection branch:
- Bump testnet from 5 nodes to 14 (>= CANDIDATE_CLOSENESS_REQUIRED=13).
With 5 nodes the new sparse-network short-circuit fires before the
matched-count check, weakening the test as a regression-guard for
the matched-count path it was originally designed to prove.
- Tighten the assertion to specifically require the matched-count
error message ('candidate pub_keys do not match the network's
closest peers') rather than any 'closeness'-substring match. The
previous loose match would accept a passing test for the wrong
reason on undersized testnets.
Summary
Two follow-ups to the pay-yourself closeness check (PR #77, shipped in v0.11.0) that were flagged by Greptile/Copilot reviews but didn't block the RC. Original PR was #78, auto-closed when the
rc-2026.4.2base branch was deleted post-merge — this re-opens it againstmain.Reject pools with duplicate candidate PeerIds outright. Counting
matchedagainst the network-returned set with no dedup meant an attacker could submit 16 copies of the same real peer's pub_key and trivially clear the 13/16 closeness threshold (one peer counted 16 times). An honest pool hasCANDIDATES_PER_POOLdistinct pub_keys by construction. Rejecting duplicates outright is cleaner than counting aHashSetof candidate PeerIds — same security guarantee, plus the rejection message tells the operator why.Clearer error when the DHT lookup returns fewer than 13 peers. The previous "candidate pub_keys do not match the network's authoritative close-group" rejection was misleading when the cause was just an undersized network response. Added an explicit "network too sparse to verify, retry" branch with a distinct error message before the matched-count check.
Test plan
closeness_rejects_pool_with_duplicate_candidate_pub_keyscovers the duplicate-PeerId attack.cargo clippy --all-features --all-targets -- -D warningsclean.cargo fmtclean.