Skip to content

fix(cluster): resolve four steady-state distributed cluster regressions#133

Merged
hyp3rd merged 1 commit into
mainfrom
feat/dist-cluster
May 13, 2026
Merged

fix(cluster): resolve four steady-state distributed cluster regressions#133
hyp3rd merged 1 commit into
mainfrom
feat/dist-cluster

Conversation

@hyp3rd
Copy link
Copy Markdown
Owner

@hyp3rd hyp3rd commented May 13, 2026

Fix a set of independent bugs causing rebalance counter churn, phantom key creation, and incarnation inflation in multi-node clusters:

  • Clone key string in applySet before storing as map key. HTTP frameworks (Fiber) back path-parameter strings with pooled request buffers reused on the next request; storing the raw pointer let map keys silently mutate, producing phantom shard entries and a persistent rebalance loop (~60 bumps/s on a 5-node cluster).

  • Add applyForwardedSet with a receiver-side ownership guard for all transport-receiver paths (InProcessTransport.ForwardSet, HTTP /internal/set). Writes forwarded from peers with a divergent ring view are silently dropped, breaking the migrate→fan-out-back→stuck cycle. Exposes new dist.write.apply_refused metric.

  • Release local copy immediately in migrateIfNeeded when removalGracePeriod == 0. Previously the local item was never removed, so the rebalance scanner re-flagged the same lost-ownership keys on every tick. Each stuck key now produces exactly one migration.

  • Make Membership.Mark a no-op on same-state calls: incarnation, version vector, and observers all stay quiet during steady-state heartbeat success. Add Membership.Refute as the dedicated SWIM self-refute primitive that unconditionally bumps incarnation. Update refuteIfSuspected to use Refute().

Adds integration tests pinning idle-cluster silence, traffic-under-load silence, one-shot drain semantics, and the ownership-guard contract. Adds five unit tests covering Mark no-op and Refute always-bumps.

Fix a set of independent bugs causing rebalance counter churn, phantom
key creation, and incarnation inflation in multi-node clusters:

- Clone key string in applySet before storing as map key. HTTP
  frameworks (Fiber) back path-parameter strings with pooled request
  buffers reused on the next request; storing the raw pointer let map
  keys silently mutate, producing phantom shard entries and a
  persistent rebalance loop (~60 bumps/s on a 5-node cluster).

- Add applyForwardedSet with a receiver-side ownership guard for all
  transport-receiver paths (InProcessTransport.ForwardSet, HTTP
  /internal/set). Writes forwarded from peers with a divergent ring
  view are silently dropped, breaking the migrate→fan-out-back→stuck
  cycle. Exposes new dist.write.apply_refused metric.

- Release local copy immediately in migrateIfNeeded when
  removalGracePeriod == 0. Previously the local item was never removed,
  so the rebalance scanner re-flagged the same lost-ownership keys on
  every tick. Each stuck key now produces exactly one migration.

- Make Membership.Mark a no-op on same-state calls: incarnation,
  version vector, and observers all stay quiet during steady-state
  heartbeat success. Add Membership.Refute as the dedicated SWIM
  self-refute primitive that unconditionally bumps incarnation. Update
  refuteIfSuspected to use Refute().

Adds integration tests pinning idle-cluster silence, traffic-under-load
silence, one-shot drain semantics, and the ownership-guard contract.
Adds five unit tests covering Mark no-op and Refute always-bumps.
@hyp3rd hyp3rd merged commit 71e1eb2 into main May 13, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant