Skip to content

Reduce work for spock.progress and remove checkpoint hook#450

Open
mason-sharp wants to merge 16 commits intomainfrom
feature/progress-without-checkpoint
Open

Reduce work for spock.progress and remove checkpoint hook#450
mason-sharp wants to merge 16 commits intomainfrom
feature/progress-without-checkpoint

Conversation

@mason-sharp
Copy link
Copy Markdown
Member

@mason-sharp mason-sharp commented May 3, 2026

Replaces Spock's WAL-RMGR + checkpoint-hook persistence for spock.progress with a file snapshot + replication-origin reconcile + pg_commit_ts recovery scan. Removes the per-commit XLogFlush from the apply hot path and the dependence on a 4-line Postgres core patch.

With the changes in this PR, the ~3.3x slowdown in main in the apply worker performance is reversed.

What's removed

  • The custom WAL RMGR's apply-progress recovery path (~200 lines): no more per-commit SPOCK_RMGR_APPLY_PROGRESS records.
  • spock_checkpoint_hook() and its registration. The Postgres core patch that exposes Checkpoint_hook can now be reverted (separate PR).
  • Per-commit XLogFlush() in handle_commit.

What's added

  • spock_init_progress_state(XLogRecPtr) — public entry point in a new src/spock_progress_recovery.c module, called once from spock_apply_main between replorigin_session_setup and spock_start_replication. Internally:
    • Reconcile: read the replication origin's durable LSN; if resource.dat's entry is stale or absent, advance remote_commit_lsn from the origin, clear stale timestamps, preserve the insert_lsn ≥ commit_lsn invariant.
    • Scan pg_commit_ts (only when reconcile detected staleness): walk backward from the latest xid, filtered on the publisher's spock node id, repopulating remote_commit_ts. Bounded termination at 1000 commits seen for the origin or 1M xids total. Restores accurate values in spock.progress post-crash; the recovered prev_remote_ts is also intended to be useful for the planned parallel-apply rework.
  • request_initial_status_update(): forced 'r' message at apply-worker reconnect, prompting the publisher to send a 'k' immediately so remote_insert_lsn populates within milliseconds instead of waiting for wal_sender_timeout/2.
  • A retargeted RMGR (id 144) emitting one WAL record per SpockGroupHash entry at each resource.dat dump event, each carrying that entry's full SpockApplyProgress snapshot (key + LSNs + timestamps). A pg_waldump trace shows the exact progress snapshot persisted at each event LSN — useful for incident reconstruction over time.
    • Per-event scope: SHUTDOWN and ADD_NODE walk the full hash (mesh-wide effect). TABLE_SYNC emits only the changed entries (per-subscription effect — the helper takes the same list update_list works on).

Performance

  • Apply hot path: one fewer synchronous fsync per applied remote-origin commit.
  • Under synchronous_commit=off (Spock's default for the apply worker, designed to let walwriter batch flushes), test 026_no_double_wal_flush.pl measures the delta as ~2 fsyncs over 30 applied commits, vs. at least 30 before.
  • Apply-worker reconnect: remote_insert_lsn populates within ~ms, asserted by 023_forced_keepalive_timing.pl with wal_sender_timeout=10min.

Bug found and fixed during this work

  • progress_update_struct enforces remote_insert_lsn >= remote_commit_lsn. Reconcile now bumps remote_insert_lsn and received_lsn to at least origin_lsn whenever it advances remote_commit_lsn. Caught by the new test 022.

Approximation, post-crash only
On a clean restart the file's actual last_updated_ts is preserved exactly. Only on the post-crash path, where reconcile clears stale timestamps and the pg_commit_ts scan repopulates them, is an approximation needed: Postgres doesn't store the local apply time when replorigin_session_origin is set during apply (only the publisher's commit_ts ends up in pg_commit_ts). For those entries we set last_updated_ts to the recovered remote_commit_ts — a lower bound (apply happened at or after the publisher's commit) that under-reports lag by typical publisher→subscriber delay rather than hiding it as zero. Refreshes to the real value on the next applied commit.

Test reliability fix

  • 008_rmgr.pl now issues a CHECKPOINT before SIGKILL so the apply worker's batch-2 commits are durable past walwriter cadence. Without this the test was racy with a high fail rate; deterministic now.

Deferred (not in this PR)

  • Reverting the Postgres-core Checkpoint_hook patch — landing it in the spock branch is the prerequisite, the core revert ships separately.
  • Generalizing the RMGR record into an event family covering add_node, drop_node, subscription state transitions, sync start/end. Sketched in docs/internals-doc/specs/2026-05-01-spock-audit-rmgr-design.md.

Layout

  • New file src/spock_progress_recovery.c + include/spock_progress_recovery.h (single public function spock_init_progress_state); reconcile + scan + their helpers all file-static.
  • request_initial_status_update stays in src/spock_apply.c (wire-protocol kin to send_feedback).

SPOC-436

Replace the Checkpoint_hook + resource.dat + custom WAL RMGR machinery
with (1) replication origins as the source of truth for remote_commit_lsn,
(2) a shutdown-only resource.dat snapshot with LSN-validated load, and
(3) a forced keepalive on apply-worker reconnect. Removes the Postgres
core patch and the extra per-commit XLogFlush.
After START_REPLICATION, send an 'r' status update with replyRequested=1
so the publisher immediately responds with a 'k' keepalive. The receive
loop will then populate remote_insert_lsn/received_lsn in shmem within
milliseconds, rather than waiting for wal_sender_timeout/2.

Also remove the pg_attribute_unused() marker from the helper's
definition now that it has a caller.
When an apply worker starts, reconcile its shmem progress entry against
the authoritative replication origin LSN. If resource.dat loaded a stale
LSN (file LSN < origin LSN), clear timestamp fields to NULL so the view
doesn't show stale-looking data. Prepares the ground for removing the
custom WAL RMGR and checkpoint hook.

Preserve the invariant remote_insert_lsn >= remote_commit_lsn asserted
by progress_update_struct: when advancing remote_commit_lsn to
origin_lsn, also bump remote_insert_lsn and received_lsn to at least
origin_lsn. The invariant is semantic -- a commit at LSN X implies the
data was received through X first. The forced keepalive on reconnect
refreshes remote_insert_lsn to the publisher's current insert position
shortly after.

Also:

- Add !SpockGroupHash || !SpockCtx guard at top, matching the defensive
  pattern used in spock_group_progress_update and other SpockGroupHash
  consumers.
- Expand the comment in the new-entry branch to explicitly call out the
  coupling with init_progress_fields in spock_group.c so future struct
  field additions are kept in lockstep.
Replace per-entry spock_apply_progress_add_to_wal() calls inside
spock_group_progress_force_set_list (add_node) and
spock_group_progress_update_list (table sync) with a single
spock_group_resource_dump() after the loop.

resource.dat is a full-snapshot file format, so N per-entry dumps
would rewrite the entire file N times. One post-loop dump gives
equivalent durability (both add_node and table-sync are all-or-nothing
at the admin level) with O(1) file writes.
handle_commit used to call spock_apply_progress_add_to_wal(&sap) right
after CommitTransactionCommand, which issued a second synchronous
XLogFlush purely to persist monitoring data. Postgres core already
flushes the commit record once; origins remain the source of truth
for remote_commit_lsn. Removing the second flush halves the fsync
pressure on the apply hot path.

Crash recovery of remote_commit_lsn is handled by replication origins
(via replorigin_session_get_progress at worker start) plus the
reconcile_progress_with_origin step added earlier.
Drops the Checkpoint_hook registration in Spock. The Postgres core
patch that added the Checkpoint_hook typedef and callout becomes
unused; it can be reverted separately in the Postgres patchset.

Post-crash behavior is now: remote_commit_lsn is recovered from the
replication origin (stock Postgres mechanism); remote_commit_ts and
last_updated_ts are NULL until the next commit arrives or a clean
shutdown dumps resource.dat. remote_insert_lsn / received_lsn refresh
within ms of reconnect via the forced keepalive.
The custom RMGR (id 144) previously carried per-applied-commit progress
records and a redo path that replayed them into shmem during recovery.
Earlier commits in this branch removed every caller of the per-commit
emit helper (handle_commit, force_set_list, update_list) and the
checkpoint hook that scanned shmem at every checkpoint.

This commit transforms the now-unused RMGR machinery into a small set
of forensic markers emitted at each resource.dat dump call site (clean
shutdown, add_node post-loop, table-sync post-loop). One record type
SPOCK_RMGR_RESOURCE_DUMP, ~16 bytes, no XLogFlush -- walwriter handles
durability. Volume is a handful per cluster day in normal operation.

Operational value: pg_waldump and recovery logs show when resource.dat
became durable and at which LSN, useful for incident analysis or to
confirm an add_node or table-sync completed and persisted before a
subsequent failure.

Records are NOT load-bearing for state recovery: shmem is reseeded on
restart by spock_group_resource_load() (file) plus
reconcile_progress_with_origin() (replication origin). The redo handler
emits a LOG line only.

Net effect: src/spock_rmgr.c shrinks from ~200 to ~150 lines; redo path
is replaced; per-commit hot path is unaffected because there are no
per-commit emit calls.
Three changes for the new file-snapshot persistence model:

- 022_rmgr_progress_post_checkpoint_crash.pl rewritten: assert that
  reconcile detects file_lsn < origin_lsn after a crash, advances
  remote_commit_lsn from the replication origin, and clears the
  (now-stale) timestamp fields. Also asserts CHECKPOINT does NOT
  update resource.dat (regression guard against re-introducing the
  checkpoint hook).

- 008_rmgr.pl Phase 4 rewritten for the same reconcile semantics:
  cover the multi-origin case (Scenario A: file has stale entry,
  Scenario B: file has no entry for an origin).

- 023_forced_keepalive_timing.pl added: assert remote_insert_lsn
  populates within a second of apply-worker reconnect with a high
  wal_sender_timeout (proves the forced keepalive, not the timer,
  drove the refresh).

- 026_no_double_wal_flush.pl added: count pg_stat_wal.wal_sync delta
  over N applied commits; assert delta < N (no per-commit forced
  fsync on the apply hot path).
Threads a typed result back to the caller so a follow-up commit can
gate post-reconcile work on whether the loaded resource.dat was in
sync, stale, anomalous, or absent. Pure refactor -- no behavior
change.
Asserts the regression target for the pg_commit_ts scan: after a crash
that leaves resource.dat stale relative to the replication origin,
remote_commit_ts must be recovered (non-NULL) rather than left blank.
Currently fails -- implementation lands in the next commit.
After reconcile_progress_with_origin detects that resource.dat is stale
or absent for an origin, scan pg_commit_ts backward from the latest
xid and populate remote_commit_ts with the max ts found for this
origin. Restores accurate values in spock.progress post-crash. The
recovered prev_remote_ts is also intended to be useful for the
planned parallel-apply rework.

Filter the scan on MySubscription->origin->id (the publisher's spock
node id), since handle_origin overwrites replorigin_session_origin
per-message with that value -- and that is what pg_commit_ts records
for applied xacts on the subscriber, NOT the local subscription's
roident.

Termination: 1000 commits seen for this origin (concurrency width is
bounded well below this by max_connections / poolers) or 1M xids
total. Constants are #defines for now (SPOCK_TS_RECOVERY_*); convert
to GUCs later if profiling motivates tuning.

Emits LOG-level messages on completion (skip path or run path) so
startup behavior is grep-able. Adds test 029 (clean-shutdown skip
path), which verifies the LOG line appears and no scan log is emitted
on clean restart. Updates test 022 to assert that remote_commit_ts is
now non-NULL post-crash (recovered via the scan).
spock_on_shmem_exit() was registered via on_shmem_exit() gated on
!IsUnderPostmaster, so spock_rmgr_log_resource_dump(SPOCK_DUMP_SHUTDOWN)
ran in postmaster context where XLogInsert is unsafe (no backend slot,
no transaction state).

Move the SHUTDOWN path to the supervisor's before_shmem_exit:

- spock_any_apply_worker_running() (new): scans SpockCtx->workers under
  SpockCtx->lock LW_SHARED for APPLY/SYNC slots with proc != NULL.
  Workers clear ->proc in spock_worker_detach() (itself a
  before_shmem_exit) after their main loop ends, so proc == NULL implies
  the worker's SpockGroupHash writes are settled.

- spock_supervisor_on_exit() (new): registered at the top of
  spock_supervisor_main. On clean exit, polls 100 ms x 50 (~5 s) for
  apply/sync workers to drain, then calls spock_group_resource_dump()
  and spock_rmgr_log_resource_dump(SPOCK_DUMP_SHUTDOWN, NULL). Logs and
  proceeds on timeout.

- spock_on_shmem_exit() and its registration are removed from
  src/spock_shmem.c. File and WAL dumps now share one owner, so both
  reflect the same drained state.

Tests: 008_rmgr, 022_rmgr_progress_post_checkpoint_crash,
023_forced_keepalive_timing, 026_no_double_wal_flush,
027_remote_commit_ts_recovery -- all pass.
@mason-sharp mason-sharp requested a review from rasifr May 3, 2026 17:58
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 3, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Removes the checkpoint hook and per-commit WAL flushes; retargets RMGR to emit per-entry resource-dump records at dump sites; adds apply-worker startup reconciliation and a bounded backward scan of pg_commit_ts to recover per-origin remote_commit_ts; seeds LSNs with an initial keepalive and moves resource.dat dumps to supervisor-controlled shutdown and sync events.

Changes

Progress State Recovery & Persistence Refactor

Layer / File(s) Summary
Design & Plan
docs/internals-doc/plans/2026-04-30-recover-remote-commit-ts-plan.md, docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md
Adds implementation plan and updated design documenting removal of checkpoint hook and per-commit XLogFlush, reconciliation enum/results, bounded pg_commit_ts backward scan for remote_commit_ts recovery, forced initial keepalive, RMGR retargeting to resource-dump records, supervisor-owned resource.dat dumps, and test plan.
Public API / Headers
include/spock_progress_recovery.h, include/spock_rmgr.h, include/spock_worker.h, include/spock_group.h
Adds spock_init_progress_state() declaration and spock_any_apply_worker_running(); introduces SPOCK_RMGR_RESOURCE_DUMP, SpockResourceDumpEvent, SpockResourceDumpRec; removes exported checkpoint-hook declaration and #include "spock_rmgr.h" from spock_group.h.
Core Recovery Logic
src/spock_progress_recovery.c
New module implementing spock_init_progress_state(origin_lsn), reconcile_progress_with_origin() returning ReconcileResult (IN_SYNC/STALE/ANOMALY/ABSENT), and recover_progress_timestamps_from_commit_ts() which scans transaction IDs backward (per-origin and global limits) and updates remote_commit_ts/prev_remote_ts under lock.
Apply Startup & Keepalive
src/spock_apply.c
Calls spock_init_progress_state(origin_startpos) during apply-worker init and adds request_initial_status_update() to send a one-shot standby status update with replyRequested to immediately refresh remote_insert_lsn/received_lsn.
Group Progress Wiring
src/spock_group.c
Stops per-entry WAL emission inside loops; updates spock_group_progress_update_list() and spock_group_progress_force_set_list() to update shmem per-entry and persist once via spock_group_resource_dump() followed by a single spock_rmgr_log_resource_dump(...); force-set uses authoritative snapshot assignment and preserves LSN invariants.
RMGR Emission & Redo
src/spock_rmgr.c
Retargets redo/desc/identify to handle only RESOURCE_DUMP records; adds spock_rmgr_log_resource_dump(event, changed_entries) that emits one WAL record per progress entry (from list or full scan) and flushes once; removes rm_startup/rm_cleanup hooks and prior apply-progress redo handling; adds helpers to format/emit records.
Supervisor Shutdown & Shmem Exit
src/spock.c, src/spock_worker.c, src/spock_shmem.c
Adds spock_supervisor_on_exit() (registered with before_shmem_exit) that polls/drains apply/sync workers via spock_any_apply_worker_running() and then calls spock_group_resource_dump() + spock_rmgr_log_resource_dump(SPOCK_DUMP_SHUTDOWN, NULL); removes previous on_shmem_exit callback and checkpoint-hook wiring.
Tests / TAP Updates
tests/tap/t/008_rmgr.pl, tests/tap/t/022_rmgr_progress_post_checkpoint_crash.pl, tests/tap/t/023_forced_keepalive_timing.pl, tests/tap/t/026_no_double_wal_flush.pl, tests/tap/t/027_remote_commit_ts_recovery.pl, tests/tap/t/029_remote_commit_ts_recovery_clean_shutdown.pl
Updates RMGR/progress tests for origin-based reconcile and non-update-on-CHECKPOINT guarantees; adds tests for forced keepalive timing (023), no double WAL flush (026), post-crash remote_commit_ts recovery (027), and clean-shutdown skip-path (029); rewrites 008 and 022 to align with new reconcile/emit semantics.

Poem

🐇 I dug through WAL and timestamp trails,

Skipped a hook, kept dumps in tidy pails.
I hopped back through commit_ts to find the time,
Bounced a keepalive, woke LSNs in a chime.
Now at shutdown one dump sings—everything’s aligned.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'Reduce work for spock.progress and remove checkpoint hook' accurately and clearly summarizes the main objectives and changes. It is specific, concise, and directly reflects the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description comprehensively details the changeset, covering what's removed (WAL-RMGR recovery path, checkpoint hook, per-commit flush), what's added (progress recovery module, keepalive request, retargeted RMGR), performance improvements, bug fixes, and deferred work. The description directly aligns with all major file changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/progress-without-checkpoint

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@mason-sharp mason-sharp changed the title Reduce work for spock.progress and remove checkpoint hook for spock.progress Reduce work for spock.progress and remove checkpoint hook May 3, 2026
@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented May 3, 2026

Up to standards ✅

🟢 Issues 4 medium

Results:
4 new issues

Category Results
Complexity 4 medium

View in Codacy

🟢 Metrics 29 complexity · 4 duplication

Metric Results
Complexity 29
Duplication 4

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/internals-doc/plans/2026-04-30-recover-remote-commit-ts-plan.md`:
- Line 11: Update the broken Spec breadcrumb: in the "**Spec:**" line replace
the incorrect path
`docs/internals-doc/specs/2026-04-30-recover-remote-commit-ts-design.md` with
the actual spec filename added in this PR (i.e., the shipped spec file name),
ensuring the "**Spec:**" entry points to the real file in
docs/internals-doc/specs so readers are not sent to a dead path.

In `@docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md`:
- Line 101: Update the documentation to point to the new implementation module
and header: replace references that say the constants/enum/helper live in
src/spock_apply.c with the moved locations in src/spock_progress_recovery.c and
include/spock_progress_recovery.h, and likewise update any other mentions (e.g.,
the three constants SPOCK_TS_RECOVERY_MIN_SEEN_PER_ORIGIN,
SPOCK_TS_RECOVERY_SCAN_LIMIT, SPOCK_TS_RECOVERY_BATCH_SIZE and related
enum/helper) so the doc accurately reflects the shipped implementation.

In `@include/spock_rmgr.h`:
- Around line 43-50: The SpockResourceDumpRec uses 16-bit fields entry_seq and
entry_total which will wrap for >65535 entries; update the struct
(SpockResourceDumpRec) to use uint32 for entry_seq and entry_total and update
serialization/deserialization and any readers/writers (notably
spock_rmgr_log_resource_dump and WAL emit/parse code) to use 32-bit widths, or
alternatively add a guard in spock_rmgr_log_resource_dump to detect counts >
UINT16_MAX and fail emission with a clear error; ensure all references to these
fields (packing/unpacking, size calculations, compatibility checks) are updated
to the chosen 32-bit type or to handle the failure path consistently.

In `@src/spock_group.c`:
- Around line 757-763: The current code calls
init_progress_fields(&entry->progress) then
progress_update_struct(&entry->progress, sap), but progress_update_struct only
copies remote_commit_lsn when sap->remote_commit_ts beats the existing
timestamp, so after reconcile_progress_with_origin this can zero out a valid
sap->remote_commit_lsn; instead, after init_progress_fields(&entry->progress)
assign the resume-related fields on entry->progress directly from sap (e.g.,
entry->progress.remote_commit_lsn = sap->remote_commit_lsn and
entry->progress.remote_commit_ts = sap->remote_commit_ts, plus any other resume
LSN/TS fields) rather than calling progress_update_struct so the authoritative
resume LSN from sap is preserved.

In `@src/spock_progress_recovery.c`:
- Around line 74-103: The code must short-circuit when there's no durable origin
progress or shared state to avoid dereferencing SpockCtx/SpockGroupHash or
scanning commit_ts on an InvalidXLogRecPtr; after calling
reconcile_progress_with_origin(origin_lsn) and handling RECONCILE_FILE_IN_SYNC,
also return immediately if result == RECONCILE_FILE_ABSENT or origin_lsn ==
InvalidXLogRecPtr or SpockCtx == NULL or SpockGroupHash == NULL so you don't
acquire SpockCtx->apply_group_master_lock or call
hash_search/recover_progress_timestamps_from_commit_ts when state is
uninitialized; update the block using those symbols (result,
reconcile_progress_with_origin, RECONCILE_FILE_ABSENT, origin_lsn, SpockCtx,
SpockGroupHash, SpockCtx->apply_group_master_lock, SpockGroupHash, hash_search,
recover_progress_timestamps_from_commit_ts) to perform this early return.

In `@src/spock_rmgr.c`:
- Around line 189-190: entry_seq is declared as uint16 and entry_total is being
truncated to uint16, which will overflow for dumps >65535 entries and corrupt
the WAL forensic stream; fix by widening the counters (e.g., use uint32_t or
uint64_t for entry_seq and entry_total) or, if protocol requires 16-bit on-wire
values, validate and reject/clamp large totals before truncation: update
declarations for entry_seq/entry_total, add a pre-check that if original_total >
UINT16_MAX then log/error and abort the dump (or emit a protocol error), and
ensure any assignment that currently casts to uint16 (the places handling
entry_total and seq) use the new wider type or explicit guarded cast with error
handling. Ensure references to entry_seq and entry_total across the file are
updated to the new type (or the validation logic is added wherever truncation
occurs).

In `@tests/tap/t/023_forced_keepalive_timing.pl`:
- Around line 74-80: The test restart uses a clean stop/start so
spock.progress.remote_insert_lsn may be pre-seeded from resource.dat causing the
poll (lines around wait_for_pg_ready and the check loop) to pass without the
forced keepalive exercising request_initial_status_update(); to fix, change the
restart path or assertions so the test verifies an actual post-reconnect LSN
change: either (a) remove or rename/ignore resource.dat before starting the
subscriber so remote_insert_lsn is not pre-seeded, or (b) read and store the
persisted spock.progress.remote_insert_lsn value before restart and after
reconnect assert that the new remote_insert_lsn (fetched from the database) is
strictly greater than the persisted value, ensuring
request_initial_status_update() ran. Ensure the code touches the restart
sequence around wait_for_pg_ready and the check loop to implement one of these
two fixes.

In `@tests/tap/t/027_remote_commit_ts_recovery.pl`:
- Around line 69-74: The test currently asserts a non-NULL remote_commit_ts too
early because a stale resource.dat can mask whether
recover_progress_timestamps_from_commit_ts() actually ran; change the check so
that after restarting the subscriber you wait until either the subscriber's
remote_commit_lsn has reached the pre-crash/origin LSN or the recovery log line
"ts recovery: scanned ... recovered remote_commit_ts" appears before asserting
the timestamp (use an existing log-wait helper or poll the subscriber via SQL to
read remote_commit_lsn and compare to the expected LSN, then assert
remote_commit_ts is set), referencing remote_commit_ts, remote_commit_lsn and
recover_progress_timestamps_from_commit_ts() in the test change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4fde131b-6342-4aff-a56b-45d0d3d12282

📥 Commits

Reviewing files that changed from the base of the PR and between b7fc702 and b111dd0.

📒 Files selected for processing (19)
  • docs/internals-doc/plans/2026-04-30-recover-remote-commit-ts-plan.md
  • docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md
  • include/spock_group.h
  • include/spock_progress_recovery.h
  • include/spock_rmgr.h
  • include/spock_worker.h
  • src/spock.c
  • src/spock_apply.c
  • src/spock_group.c
  • src/spock_progress_recovery.c
  • src/spock_rmgr.c
  • src/spock_shmem.c
  • src/spock_worker.c
  • tests/tap/t/008_rmgr.pl
  • tests/tap/t/022_rmgr_progress_post_checkpoint_crash.pl
  • tests/tap/t/023_forced_keepalive_timing.pl
  • tests/tap/t/026_no_double_wal_flush.pl
  • tests/tap/t/027_remote_commit_ts_recovery.pl
  • tests/tap/t/029_remote_commit_ts_recovery_clean_shutdown.pl
💤 Files with no reviewable changes (2)
  • include/spock_group.h
  • src/spock_shmem.c

Comment thread docs/internals-doc/plans/2026-04-30-recover-remote-commit-ts-plan.md Outdated
Comment thread docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md Outdated
Comment thread include/spock_rmgr.h
Comment thread src/spock_group.c
Comment thread src/spock_progress_recovery.c
Comment thread src/spock_rmgr.c
Comment thread tests/tap/t/023_forced_keepalive_timing.pl Outdated
Comment thread tests/tap/t/027_remote_commit_ts_recovery.pl
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md`:
- Around line 207-208: The docs currently state that on startup with origin LSN
== InvalidXLogRecPtr we "keep resource.dat value if any", but the implementation
reconciles first, then skips the pg_commit_ts scan so any existing resource.dat
entry with file_lsn > InvalidXLogRecPtr is actually rewritten to the origin LSN
with cleared timestamps; update the sentence to describe this real behavior
(mention reconciliation step, pg_commit_ts scan being skipped,
InvalidXLogRecPtr, resource.dat and that entries are rewritten to origin LSN
with cleared timestamps) so the spec matches the code.

In `@src/spock_progress_recovery.c`:
- Around line 187-191: When initializing a new entry with a valid origin_lsn,
seed the other LSN fields so the invariant "remote_insert_lsn/received_lsn >=
remote_commit_lsn" holds: after setting entry->progress.remote_commit_lsn =
origin_lsn, also set entry->progress.remote_insert_lsn and
entry->progress.received_lsn to origin_lsn (or to
entry->progress.remote_commit_lsn) instead of leaving them as InvalidXLogRecPtr;
update the initialization in the same block that assigns origin_lsn to
remote_commit_lsn (referencing entry->progress.remote_commit_lsn,
entry->progress.remote_insert_lsn, entry->progress.received_lsn and origin_lsn).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f7407f1b-f75a-4b69-86e9-7cb0474023ca

📥 Commits

Reviewing files that changed from the base of the PR and between b111dd0 and b3e0786.

📒 Files selected for processing (6)
  • docs/internals-doc/plans/2026-04-30-recover-remote-commit-ts-plan.md
  • docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md
  • src/spock_group.c
  • src/spock_progress_recovery.c
  • tests/tap/t/023_forced_keepalive_timing.pl
  • tests/tap/t/027_remote_commit_ts_recovery.pl
✅ Files skipped from review due to trivial changes (1)
  • docs/internals-doc/plans/2026-04-30-recover-remote-commit-ts-plan.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/tap/t/023_forced_keepalive_timing.pl
  • tests/tap/t/027_remote_commit_ts_recovery.pl
  • src/spock_group.c

Comment thread docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md Outdated
Comment thread src/spock_progress_recovery.c Outdated
@mason-sharp mason-sharp force-pushed the feature/progress-without-checkpoint branch from b3e0786 to 2e0333b Compare May 4, 2026 01:57
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md`:
- Around line 79-80: Update the docs bullet for the "Entry absent" case to state
that non-timestamp LSN fields remote_insert_lsn and received_lsn are seeded from
origin_lsn (instead of left zero/NULL) so that insert/received >= commit holds
on first read; refer to the behavior implemented in
src/spock_progress_recovery.c and mention the resulting state is
RECONCILE_FILE_ABSENT while keeping remote_commit_lsn = origin_lsn and
timestamps zero/NULL as before.

In `@src/spock_progress_recovery.c`:
- Around line 117-124: The recovery scan currently only filters by the publisher
origin id (call site uses recover_progress_timestamps_from_commit_ts(entry,
(RepOriginId) MySubscription->origin->id)) which can return stale rows from
prior/recreated subscriptions; change recover_progress_timestamps_from_commit_ts
to accept an additional subscription-specific fence (for example the local
subscription identifier MySubscription->id or the subscription's
roident/roident+remote_commit_lsn boundary) and update both call sites (the one
shown and the one around lines 315–318) to pass that fence; inside
recover_progress_timestamps_from_commit_ts apply the extra filter so you only
consider pg_commit_ts rows that match the current subscription/progress window
(and optionally constrain by remote_commit_lsn) before updating remote_commit_ts
/ prev_remote_ts.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d812d38a-ffe2-4a27-a6e3-030f4440c91b

📥 Commits

Reviewing files that changed from the base of the PR and between b3e0786 and 2e0333b.

📒 Files selected for processing (6)
  • docs/internals-doc/plans/2026-04-30-recover-remote-commit-ts-plan.md
  • docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md
  • src/spock_group.c
  • src/spock_progress_recovery.c
  • tests/tap/t/023_forced_keepalive_timing.pl
  • tests/tap/t/027_remote_commit_ts_recovery.pl
✅ Files skipped from review due to trivial changes (1)
  • docs/internals-doc/plans/2026-04-30-recover-remote-commit-ts-plan.md
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/spock_group.c
  • tests/tap/t/027_remote_commit_ts_recovery.pl
  • tests/tap/t/023_forced_keepalive_timing.pl

Comment thread src/spock_progress_recovery.c
Bug fix (force-set path):
- src/spock_group.c:spock_group_progress_force_set_list. The path used
  init_progress_fields() + progress_update_struct(), but the latter's
  max-by-ts merge skips remote_commit_lsn when sap->remote_commit_ts
  is 0 -- a legal shape after the new reconcile_progress_with_origin
  clears stale ts. The old code could leave a force-set entry with
  InvalidXLogRecPtr commit_lsn, breaking cross-peer apply-group
  coordination on a fresh node post-add_node. Replaced with a struct
  copy + insert_lsn/received_lsn invariant clamps; comment explains
  why we can't reuse progress_update_struct here.

Defensive guards (spock_init_progress_state):
- NULL shmem short-circuit. reconcile_progress_with_origin already
  defends against NULL SpockCtx/SpockGroupHash, but the caller went on
  to dereference them anyway. Mirror the guard so the contract is
  self-consistent.
- XLogRecPtrIsInvalid(origin_lsn) short-circuit. A fresh subscription
  has no progress to recover, and pg_commit_ts may still hold rows for
  the same publisher origin id from a prior subscription -- scanning
  would backfill stale historical timestamps unrelated to the current
  subscription's actual state.

Reconcile new-entry branch:
- Seed remote_insert_lsn and received_lsn to origin_lsn (matching the
  STALE/ANOMALY branches) so the insert >= commit invariant holds
  immediately, rather than briefly exposing commit_lsn > 0 with
  insert_lsn = 0 in spock.progress until the first keepalive arrives.

Recovery scan filter scope (won't-fix, documented):
- The pg_commit_ts scan filters on MySubscription->origin->id
  (publisher's spock node id), not subscription-unique. Sibling subs to
  the same publisher, or a dropped/recreated sub with old commits in
  the 1M-xid window, can contribute to running_max_ts. Failure mode is
  narrow and forensic-only (apply-group ordering disabled today). Code
  comment + spec edge-case entry document the limitation; a real fence
  is left for the planned parallel-apply rework.
- While in the area, softened overconfident "switch to commit-LSN"
  references in the spec to a neutral "revisit the ordering coordinate"
  (LSN ordering doesn't generalize cleanly to a multi-publisher mesh),
  and reworded remaining "Required for parallel-apply ordering" lines
  in the plan to "intended for future parallel-apply rework."

Test fixes:
- 023_forced_keepalive_timing.pl: the test asserted remote_insert_lsn
  (only updated by 'w' messages, not 'k' keepalives) AND was masked by
  resource.dat reseed on clean restart -- it would pass even with a
  completely broken forced keepalive. Now removes resource.dat between
  stop and start, and asserts received_lsn (the field 'k' actually
  populates via UpdateWorkerStats).
- 027_remote_commit_ts_recovery.pl: post-crash assertion could pass
  via stale resource.dat reseed before the recovery scan ran. Now
  captures pre-crash remote_commit_lsn, then gates on
  (commit_lsn >= pre_crash_lsn AND ts IS NOT NULL) to prove both
  reconcile and recovery scan executed. The (commit_lsn, ts) pair is
  updated atomically under apply_group_master_lock, so single SQL
  reads see consistent pairs.

Tests: 008, 022, 023, 026, 027 -- all pass (108 tests).
@mason-sharp mason-sharp force-pushed the feature/progress-without-checkpoint branch from 2e0333b to 3b4a10a Compare May 4, 2026 03:03
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md (1)

79-82: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Document the seeded remote_insert_lsn / received_lsn values in the ABSENT path.

The implementation no longer leaves all non-timestamp fields zero/NULL here: new entries seed remote_insert_lsn and received_lsn from origin_lsn to preserve insert/received >= commit from the first read. This wording is now inaccurate, and the same fix is needed again at Line 203.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md` around
lines 79 - 82, Update the ABSENT case description to state that new entries seed
remote_insert_lsn and received_lsn from origin_lsn (not left zero/NULL) in
addition to initializing remote_commit_lsn = origin_lsn and zeroing timestamps,
so the invariant insert/received >= commit holds on first read; apply the same
wording change to the other ABSENT description elsewhere in the document that
currently says non-timestamp fields are zero/NULL (ensure references to
remote_commit_lsn, remote_insert_lsn, received_lsn, and RECONCILE_FILE_ABSENT
are updated).
src/spock_progress_recovery.c (1)

319-322: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Fence the pg_commit_ts scan to this subscription's progress window.

This still recovers timestamps from any commit with the same target_origin. If a subscription is recreated, or a sibling subscription shares the same publisher node id, running_max_ts can come from unrelated history and push remote_commit_ts / prev_remote_ts ahead of this entry's durable state after restart. Filtering on origin id alone is too broad here.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md`:
- Around line 41-45: Update the opening description that currently states
"resource.dat as a shutdown-only snapshot" to accurately reflect the as-built
behavior by listing all dump paths: the shutdown snapshot plus explicit runtime
dumps via spock_group_resource_dump() (used on add_node) and the table-sync dump
path; reference the symbol resource.dat and the function
spock_group_resource_dump(), and mention add_node and table-sync so readers can
find the corresponding runtime sections that document those dump triggers.

---

Duplicate comments:
In `@docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md`:
- Around line 79-82: Update the ABSENT case description to state that new
entries seed remote_insert_lsn and received_lsn from origin_lsn (not left
zero/NULL) in addition to initializing remote_commit_lsn = origin_lsn and
zeroing timestamps, so the invariant insert/received >= commit holds on first
read; apply the same wording change to the other ABSENT description elsewhere in
the document that currently says non-timestamp fields are zero/NULL (ensure
references to remote_commit_lsn, remote_insert_lsn, received_lsn, and
RECONCILE_FILE_ABSENT are updated).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dd8160e0-381d-44c0-9421-a9f355c47c1b

📥 Commits

Reviewing files that changed from the base of the PR and between 2e0333b and 3b4a10a.

📒 Files selected for processing (6)
  • docs/internals-doc/plans/2026-04-30-recover-remote-commit-ts-plan.md
  • docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md
  • src/spock_group.c
  • src/spock_progress_recovery.c
  • tests/tap/t/023_forced_keepalive_timing.pl
  • tests/tap/t/027_remote_commit_ts_recovery.pl
✅ Files skipped from review due to trivial changes (4)
  • tests/tap/t/027_remote_commit_ts_recovery.pl
  • tests/tap/t/023_forced_keepalive_timing.pl
  • docs/internals-doc/plans/2026-04-30-recover-remote-commit-ts-plan.md
  • src/spock_group.c

Comment thread docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md Outdated
The Design overview opening said resource.dat was a "shutdown-only
snapshot," contradicting the Runtime and Code-changes sections that
document spock_group_resource_dump() on add_node and table-sync too.
Reword to enumerate all three call sites (clean shutdown, add_node,
table-sync) so the as-built description is internally consistent.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md (1)

79-79: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the entry-absent state description to match the shipped reconcile code.

Line 79 still says “other fields zero/NULL,” but reconcile_progress_with_origin() seeds remote_insert_lsn and received_lsn to origin_lsn on absent entries. Please update this bullet so the spec matches runtime behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md` at line
79, Update the "Entry absent" bullet to reflect runtime behavior: instead of
saying "other fields zero/NULL", state that reconcile_progress_with_origin()
seeds remote_insert_lsn and received_lsn to origin_lsn while remote_commit_lsn =
origin_lsn and the remaining fields remain zero/NULL, and keep the resulting
state RECONCILE_FILE_ABSENT; reference reconcile_progress_with_origin(),
remote_commit_lsn, remote_insert_lsn, received_lsn, and RECONCILE_FILE_ABSENT
when making the change.
🧹 Nitpick comments (2)
docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md (2)

219-219: ⚡ Quick win

Add explicit origin semantics (origin=0 vs unavailable) in this edge-case note.

This section discusses origin filtering and cross-subscription bleed-through; please explicitly state that origin=0 means local writes, while NULL/unknown means unavailable origin metadata, so readers don’t conflate those cases during audits/reconciliation.

Based on learnings: in Spock docs, distinguish origin=0 (local change) from NULL/unknown (origin unavailable), including implications for audit/reconciliation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md` at line
219, Update the edge-case note to explicitly state origin semantics: clarify
that origin=0 denotes local writes while NULL/unknown denotes unavailable origin
metadata, and explain how this affects the filter using roident ==
MySubscription->origin->id and the computation of running_max_ts and its
forensic impact on prev_remote_ts; mention that sibling subscriptions or
dropped/recreated subscriptions can bleed in commits when origin metadata is
unavailable and note implications for audit/reconciliation and the planned
parallel-apply rework.

68-69: ⚡ Quick win

Avoid calling RMGR records a “marker” here; they carry full snapshots.

Line 68 currently describes a dump-event marker, but SPOCK_RMGR_RESOURCE_DUMP records embed full SpockApplyProgress payloads per entry. Tightening this wording will prevent confusion with lightweight marker-only events.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md` around
lines 68 - 69, The wording incorrectly calls the RMGR record a "marker" — update
the text to state that RMGR (id 144) implements SPOCK_RMGR_RESOURCE_DUMP records
which embed full SpockApplyProgress payloads per entry (not lightweight
markers); revise the sentence describing the redo handler to say it emits a LOG
line and that each dump record contains the full snapshot payload
(SpockApplyProgress) rather than being a marker-only event, referencing
SPOCK_RMGR_RESOURCE_DUMP and SpockApplyProgress to make the distinction
explicit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md`:
- Line 79: Update the "Entry absent" bullet to reflect runtime behavior: instead
of saying "other fields zero/NULL", state that reconcile_progress_with_origin()
seeds remote_insert_lsn and received_lsn to origin_lsn while remote_commit_lsn =
origin_lsn and the remaining fields remain zero/NULL, and keep the resulting
state RECONCILE_FILE_ABSENT; reference reconcile_progress_with_origin(),
remote_commit_lsn, remote_insert_lsn, received_lsn, and RECONCILE_FILE_ABSENT
when making the change.

---

Nitpick comments:
In `@docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md`:
- Line 219: Update the edge-case note to explicitly state origin semantics:
clarify that origin=0 denotes local writes while NULL/unknown denotes
unavailable origin metadata, and explain how this affects the filter using
roident == MySubscription->origin->id and the computation of running_max_ts and
its forensic impact on prev_remote_ts; mention that sibling subscriptions or
dropped/recreated subscriptions can bleed in commits when origin metadata is
unavailable and note implications for audit/reconciliation and the planned
parallel-apply rework.
- Around line 68-69: The wording incorrectly calls the RMGR record a "marker" —
update the text to state that RMGR (id 144) implements SPOCK_RMGR_RESOURCE_DUMP
records which embed full SpockApplyProgress payloads per entry (not lightweight
markers); revise the sentence describing the redo handler to say it emits a LOG
line and that each dump record contains the full snapshot payload
(SpockApplyProgress) rather than being a marker-only event, referencing
SPOCK_RMGR_RESOURCE_DUMP and SpockApplyProgress to make the distinction
explicit.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6516baea-a752-40fd-a769-6be8f182af47

📥 Commits

Reviewing files that changed from the base of the PR and between 3b4a10a and 463e080.

📒 Files selected for processing (1)
  • docs/internals-doc/specs/spock-progress-no-checkpoint-hook-design.md

Three small fixes from a review of the progress-recovery PR:

- Export spock_init_progress_fields (renamed from static
  init_progress_fields for prefix consistency); reconcile's new-entry
  block now calls it instead of inlining a field-reset block that had
  to stay in lockstep.

- Anomaly WARNING ("file LSN ahead of origin; trusting origin") now
  carries the discarded file values (commit_lsn, insert_lsn,
  received_lsn, commit_ts, prev_remote_ts, last_updated_ts) so a
  postmortem can recover them.

- Comment on request_initial_status_update spelling out that 'r'/'k'
  is the libpqwalreceiver wire protocol (works on all spock proto
  versions); 'k' advances received_lsn on v4 and v5+; remote_insert_lsn
  refresh paths differ (v5+ from 'w' header, v4 from COMMIT payload).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant