feat(backend): add adaptive Merkle anti-entropy backoff scheduling#123
Merged
Conversation
Introduce `WithDistMerkleAdaptiveBackoff(maxFactor)` to let the auto-sync loop progressively back off when all peers are in sync. Behaviour: - Each clean tick (zero divergence across all peers) doubles the sleep interval, capped at `maxFactor × base interval`. - Any dirty tick or sync error snaps the factor back to 1× immediately, so recovery is never lazy. - Disabled by default (maxFactor ≤ 1), preserving existing behaviour for all current deployments. Implementation details: - Replace fixed `time.Ticker` in `autoSyncLoop` with a reset `time.Timer` driven by `nextAutoSyncDelay`. - Refactor `SyncWith` into `syncWithStatus` (returns clean/dirty signal) and a thin public `SyncWith` wrapper to keep the API unchanged. - `runAutoSyncTick` now returns a clean bool consumed by `updateAutoSyncBackoff`. Observability: - New OTel gauge `dist.auto_sync.backoff_factor` (current multiplier). - New OTel counter `dist.auto_sync.clean_ticks` (cumulative clean ticks). - Factor changes are logged once at Info level; no per-tick spam. - `DistMetrics` exposes `AutoSyncBackoffFactor` and `AutoSyncCleanTicks`. Tests (`pkg/backend/dist_adaptive_backoff_test.go`): - `TestAdaptiveBackoff_DisabledIsNoop` — back-compat guarantee. - `TestAdaptiveBackoff_RampsAndCaps` — doubling, cap enforcement, dirty reset. - `TestAdaptiveBackoff_NextDelayMultiplies` — delay calculation contract. - `TestAdaptiveBackoff_MaxFactorOneStaysDisabled` — edge case: maxFactor=1. - `TestAdaptiveBackoff_OptionNormalisesNegatives` — option validation.
- github.com/gofiber/utils/v2: v2.0.4 → v2.0.5 - github.com/fxamacker/cbor/v2: v2.9.1 → v2.9.2 Routine patch-level updates to indirect dependencies; no API changes expected.
Tag hints at queue time with their origin (replication fan-out vs
rebalance migration) and track five new per-source OTel counters:
- dist.migration.queued – migration hints enqueued
- dist.migration.replayed – migration hints successfully delivered
- dist.migration.expired – migration hints aged past TTL
- dist.migration.dropped – migration hints discarded (transport error or global cap)
- dist.migration.last_age_ns – queue residency of the most-recently replayed
migration hint; direct signal of new-primary
reachability during rolling deploys
Existing dist.hinted.* counters continue to aggregate across both
sources; replication-only counts are derivable as (aggregate - migration).
No second queue or drain loop is introduced. The implementation extends
the existing hinted-handoff infrastructure with a lightweight hintSource
tag on hintedEntry and matching per-source counter branches on every
terminal path in queueHint and processHint (global-cap drop, queue
success, expiry, replay success, and transport-error drop).
Adds pkg/backend/dist_migration_hint_test.go with six focused tests
covering source-tag preservation through queue → replay, per-source
counter increments on every terminal path, and the not-found keep path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Introduce
WithDistMerkleAdaptiveBackoff(maxFactor)to let theauto-sync loop progressively back off when all peers are in sync.
Behaviour:
interval, capped at
maxFactor × base interval.so recovery is never lazy.
for all current deployments.
Implementation details:
time.TickerinautoSyncLoopwith a resettime.Timerdriven by
nextAutoSyncDelay.SyncWithintosyncWithStatus(returns clean/dirty signal)and a thin public
SyncWithwrapper to keep the API unchanged.runAutoSyncTicknow returns a clean bool consumed byupdateAutoSyncBackoff.Observability:
dist.auto_sync.backoff_factor(current multiplier).dist.auto_sync.clean_ticks(cumulative clean ticks).DistMetricsexposesAutoSyncBackoffFactorandAutoSyncCleanTicks.Tests (
pkg/backend/dist_adaptive_backoff_test.go):TestAdaptiveBackoff_DisabledIsNoop— back-compat guarantee.TestAdaptiveBackoff_RampsAndCaps— doubling, cap enforcement, dirty reset.TestAdaptiveBackoff_NextDelayMultiplies— delay calculation contract.TestAdaptiveBackoff_MaxFactorOneStaysDisabled— edge case: maxFactor=1.TestAdaptiveBackoff_OptionNormalisesNegatives— option validation.