Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,34 @@ All notable changes to HyperCache are recorded here. The format follows

### Added

- **Async read-repair batching (Phase 4) + unconditional `ForwardSet`-only repair.** Two composing changes
in the same PR that together cut the wire-call cost of read-repair under quorum reads. (1) The defensive
`ForwardGet` probe in `repairRemoteReplica` is gone — every repair is now exactly one `ForwardSet`,
because the receiver's `applySet` already version-compares and noops downgrades, so the probe was pure
duplication. ~50% wire-call reduction per repair regardless of batching. (2) New opt-in
[`backend.WithDistReadRepairBatch(interval, maxBatchSize)`](pkg/backend/dist_memory.go) option queues
repairs by destination peer + key (last-write-wins by `(version, origin)`) and dispatches per-peer batches
on the interval or when a peer's pending count hits `maxBatchSize`. Concurrent reads of the same hot key
produce ONE repair through the queue, not N — the coalescer collapses duplicate `(peer, key)` entries
and bumps the new `dist.read_repair.coalesced` counter per collapsed enqueue. Disabled by default
(`interval == 0` = current synchronous behavior preserved, so `TestDistMemoryReadRepair` and
`TestDistMemoryRemoveReplication` pass byte-identical). Clean shutdown drains the queue inside `Stop()`;
crash exit loses queued repairs by design, with merkle anti-entropy as the convergence safety net.
New [`pkg/backend/dist_read_repair.go`](pkg/backend/dist_read_repair.go) hosts the `repairQueue` type
with errgroup-driven per-peer parallel `ForwardSet` dispatch. Eight unit tests in
[`pkg/backend/dist_read_repair_test.go`](pkg/backend/dist_read_repair_test.go) cover the coalesce rule
(same `(peer, key)` keeps the higher version, distinct peers stay independent), the size-threshold
inline flush, the nil-transport noop path, the `Stop()` drain semantics, the `(version, origin)`
tie-break rule, and concurrent-enqueue race-safety. Three integration tests in
[`tests/hypercache_distmemory_readrepair_batch_test.go`](tests/hypercache_distmemory_readrepair_batch_test.go)
drive the end-to-end shape — a 3-node RF=3 ConsistencyQuorum cluster, one node's local copy dropped,
N concurrent Gets from a third node — and assert the batched flush heals the dropped node, parallel
reads coalesce to ≤2 dispatches (one per remote owner) regardless of N, and `Stop()` drains queued
repairs before returning. Two new OTel metrics:
`dist.read_repair.batched` (per actual `ForwardSet` dispatched by the queue's flusher) and
`dist.read_repair.coalesced` (per duplicate-enqueue collapsed). New "Tuning — read-repair batching"
section in [`docs/operations.md`](docs/operations.md) covers the option shape, the divergence-window
trade-off, the two metrics, and when to enable it (high read-amplification with stable hot keys).
- **Token-refresh visibility for the OIDC source.** Closes RFC 0003 open question 6: the
`WithOIDCClientCredentials` source now wraps its `oauth2.TokenSource` with a logger that emits one
`"oidc token rotated"` Info line per real rotation (expiry change), staying silent on cached returns.
Expand Down Expand Up @@ -258,6 +286,23 @@ All notable changes to HyperCache are recorded here. The format follows

### Fixed

- **Set-forward promotion no longer requires the in-process `ErrBackendNotFound` sentinel.**
[`handleForwardPrimary`](pkg/backend/dist_memory.go) used to gate "primary unreachable → promote to
replica" on `errors.Is(errFwd, sentinel.ErrBackendNotFound)`, the error the in-process transport returns
for an unregistered peer. HTTP/gRPC transports against a stopped container surface
`net.OpError` / `io.EOF` / `context.DeadlineExceeded` instead — none of which matched the condition.
Result: when a cluster node was killed (e.g. `docker stop` in
[`scripts/tests/20-test-cluster-resilience.sh`](scripts/tests/20-test-cluster-resilience.sh)), writes for
keys whose primary was the dead node failed immediately at the forwarding hop, no hint was queued, and
the data never landed anywhere — the same 7 of 50 "during-*" writes failed reproducibly in CI's cluster
workflow. Promotion now triggers on **any** non-nil forward error when the local node is in `owners[1:]`,
matching the in-process and production transport behavior under the same resilience contract. Spurious
promotion on a transient blip is benign — `applySet` version-compares on the receiver, and merkle
anti-entropy / `chooseNewer` reconcile any divergent `(version, origin)` pair via the existing
last-write-wins rule. New test [`TestDistSet_PromotesOnGenericForwardError`](tests/hypercache_distmemory_forward_primary_promotion_test.go)
uses the chaos hooks at `DropRate=1.0` to deterministically force a generic forward error and asserts the
Set succeeds via promotion; the existing `TestDistFailureRecovery` continues to pass byte-identical (the
change widens the promotion gate, doesn't narrow it).
- **`TestDistRebalanceReplicaDiffThrottle` no longer flakes under `make test-race`.** The test's 900ms hard
sleep wasn't enough wall-clock budget for the rebalancer's 80ms-tick loop to actually fire 11 ticks under
`-race` + `-shuffle=on`'s scheduler pressure. Replaced the sleep with a 5-second polling loop that exits as
Expand Down
39 changes: 22 additions & 17 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,16 @@ run-example:
update-deps:
go get -u -t ./... && go mod tidy -v && go mod verify

# check_command_exists expands to a recipe line that succeeds if the
# given command resolves on PATH, otherwise prints "<cmd> command not
# found" and exits non-zero. Used by prepare-base-tools' chain of
# `$(call check_command_exists,X) || install-fallback` lines: the
# first failure message above is the developer-facing hint; the
# install fallback fires when the tool itself is genuinely missing.
define check_command_exists
@which $(1) > /dev/null 2>&1 || (echo "$(1) command not found" && exit 1)
endef

prepare-toolchain: prepare-base-tools

prepare-base-tools:
Expand Down Expand Up @@ -216,23 +226,18 @@ docs-serve: docs-build
PYENV_VERSION=mkdocs mkdocs serve

pre-commit:
@eval "$$(pyenv init -)" && \
pyenv activate pre-commit && \
pre-commit run -a trailing-whitespace && \
pre-commit run -a end-of-file-fixer && \
pre-commit run -a markdownlint && \
pre-commit run -a yamllint && \
pre-commit run -a cspell && \
pre-commit run -a cspell

# check_command_exists is a helper function that checks if a command exists.
define check_command_exists
@which $(1) > /dev/null 2>&1 || (echo "$(1) command not found" && exit 1)
endef

ifeq ($(call check_command_exists,$(1)),false)
$(error "$(1) command not found")
endif
@if command -v pyenv >/dev/null 2>&1; then \
eval "$$(pyenv init -)" && \
pyenv activate pre-commit && \
pre-commit run -a trailing-whitespace && \
pre-commit run -a end-of-file-fixer && \
pre-commit run -a markdownlint && \
pre-commit run -a yamllint && \
pre-commit run -a cspell && \
pre-commit run -a cspell; \
else \
echo "pyenv command not found"; \
fi

# help prints a list of available targets and their descriptions.
help:
Expand Down
6 changes: 6 additions & 0 deletions cspell.config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ words:
- Akudx
- aliceonly
- ALPN
- amortisation
- APITLS
- APITLSCA
- assertable
Expand Down Expand Up @@ -68,6 +69,7 @@ words:
- clientcredentials
- cmap
- Cmder
- coalescer
- codacy
- codebook
- codegen
Expand All @@ -86,12 +88,14 @@ words:
- derr
- disambiguator
- distconfig
- distmemory
- distroless
- EDITMSG
- elif
- Equalf
- errcheck
- errchkjson
- errgroup
- Errorf
- errp
- eventbus
Expand Down Expand Up @@ -164,6 +168,7 @@ words:
- keepalive
- keepalives
- keyf
- keyfmt
- keypair
- lamport
- lblll
Expand Down Expand Up @@ -216,6 +221,7 @@ words:
- pygments
- pymdownx
- reaad
- readrepair
- recvcheck
- rediscluster
- Redocly
Expand Down
38 changes: 38 additions & 0 deletions docs/operations.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,44 @@ otherwise mark a healthy peer suspect. `dist.heartbeat.indirect_probe.refuted` r
probes are saving you from spurious flapping; rising `dist.heartbeat.indirect_probe.failure` indicates the
peer is genuinely unreachable from multiple vantage points.

## Tuning — read-repair batching

Quorum reads (`ConsistencyQuorum` / `ConsistencyAll`) issue best-effort read-repair to every owner of the
chosen item — one `ForwardSet` per owner per read. Under read-heavy workloads on hot keys with stale replicas,
this fan-out becomes the dominant network cost on the dist transport.

`backend.WithDistReadRepairBatch(interval, maxBatchSize)` opts into async coalescing: repairs are queued by
destination peer + key (last-write-wins by `(version, origin)`) and dispatched by a background flusher on the
configured interval or when a per-peer batch hits `maxBatchSize`. Concurrent reads of the same hot key
produce one repair through the queue, not N.

```go
dm, _ := backend.NewDistMemory(ctx,
backend.WithDistReadConsistency(backend.ConsistencyQuorum),
backend.WithDistReadRepairBatch(100*time.Millisecond, 64),
)
```

**Trade-off.** Batched mode introduces a divergence window of up to `interval` where a stale replica stays
stale. Merkle anti-entropy (`WithDistMerkleAutoSync`) is the safety net for any repair the queue drops on
crash exit; clean shutdown drains the queue inside `Stop()`. Disabled by default — the existing synchronous
read-repair path is preserved when the option is not set.

**Detection.** Two new metrics quantify the amortisation:

- `dist.read_repair.batched` — repairs dispatched via the queue's flusher (one bump per actual `ForwardSet`).
- `dist.read_repair.coalesced` — repairs short-circuited because a same-version-or-higher entry was already
queued for the same `(peer, key)`. Every concurrent same-key read past the first bumps this.

The aggregate `dist.read_repair` counter still bumps per repair attempt (sync dispatch or enqueue). Operators
read `dist.read_repair.batched / dist.read_repair` for the fraction routed through the queue, and
`dist.read_repair.coalesced` as the saved-wire-call counter.

**When to enable.** High read-amplification with stable hot keys; the typical pattern is a partitioned
workload where a small set of keys takes the majority of reads and one of the owners drops out briefly. Not
useful for write-heavy paths (which don't go through read-repair) or for workloads with a flat key
distribution (the coalescer has little to collapse).

## Operational tasks

### Drain a node
Expand Down
2 changes: 2 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ require (
go.opentelemetry.io/otel/trace v1.43.0
golang.org/x/crypto v0.51.0
golang.org/x/oauth2 v0.36.0
golang.org/x/sync v0.20.0
gopkg.in/yaml.v3 v3.0.1
)

Expand All @@ -36,6 +37,7 @@ require (
github.com/mattn/go-isatty v0.0.22 // indirect
github.com/philhofer/fwd v1.2.0 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/shamaton/msgpack/v3 v3.1.1 // indirect
github.com/tinylib/msgp v1.6.4 // indirect
github.com/valyala/bytebufferpool v1.0.0 // indirect
github.com/valyala/fasthttp v1.71.0 // indirect
Expand Down
6 changes: 4 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,8 @@ github.com/redis/go-redis/v9 v9.19.0 h1:XPVaaPSnG6RhYf7p+rmSa9zZfeVAnWsH5h3lxthO
github.com/redis/go-redis/v9 v9.19.0/go.mod h1:v/M13XI1PVCDcm01VtPFOADfZtHf8YW3baQf57KlIkA=
github.com/rogpeppe/go-internal v1.14.1 h1:UQB4HGPB6osV0SQTLymcB4TgvyWu6ZyliaW0tI/otEQ=
github.com/rogpeppe/go-internal v1.14.1/go.mod h1:MaRKkUm5W0goXpeCfT7UZI6fk/L7L7so1lCWt35ZSgc=
github.com/shamaton/msgpack/v3 v3.1.0 h1:jsk0vEAqVvvS9+fTZ5/EcQ9tz860c9pWxJ4Iwecz8gU=
github.com/shamaton/msgpack/v3 v3.1.0/go.mod h1:DcQG8jrdrQCIxr3HlMYkiXdMhK+KfN2CitkyzsQV4uc=
github.com/shamaton/msgpack/v3 v3.1.1 h1:1EkrTpc68/H9bziTVw9eDLHLeK2v8aAyyv60quMqIY4=
github.com/shamaton/msgpack/v3 v3.1.1/go.mod h1:DcQG8jrdrQCIxr3HlMYkiXdMhK+KfN2CitkyzsQV4uc=
github.com/stretchr/testify v1.11.1 h1:7s2iGBzp5EwR7/aIZr8ao5+dra3wiQyKjjFuvgVKu7U=
github.com/stretchr/testify v1.11.1/go.mod h1:wZwfW3scLgRK+23gO65QZefKpKQRnfz6sD981Nm4B6U=
github.com/tinylib/msgp v1.6.4 h1:mOwYbyYDLPj35mkA2BjjYejgJk9BuHxDdvRnb6v2ZcQ=
Expand Down Expand Up @@ -95,6 +95,8 @@ golang.org/x/net v0.54.0 h1:2zJIZAxAHV/OHCDTCOHAYehQzLfSXuf/5SoL/Dv6w/w=
golang.org/x/net v0.54.0/go.mod h1:Sj4oj8jK6XmHpBZU/zWHw3BV3abl4Kvi+Ut7cQcY+cQ=
golang.org/x/oauth2 v0.36.0 h1:peZ/1z27fi9hUOFCAZaHyrpWG5lwe0RJEEEeH0ThlIs=
golang.org/x/oauth2 v0.36.0/go.mod h1:YDBUJMTkDnJS+A4BP4eZBjCqtokkg1hODuPjwiGPO7Q=
golang.org/x/sync v0.20.0 h1:e0PTpb7pjO8GAtTs2dQ6jYa5BWYlMuX047Dco/pItO4=
golang.org/x/sync v0.20.0/go.mod h1:9xrNwdLfx4jkKbNva9FpL6vEN7evnE43NNNJQ2LF3+0=
golang.org/x/sys v0.44.0 h1:ildZl3J4uzeKP07r2F++Op7E9B29JRUy+a27EibtBTQ=
golang.org/x/sys v0.44.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw=
golang.org/x/text v0.37.0 h1:Cqjiwd9eSg8e0QAkyCaQTNHFIIzWtidPahFWR83rTrc=
Expand Down
Loading
Loading