From 9249c389061f5a8b7d8ebc88f56e76f4f027772e Mon Sep 17 00:00:00 2001 From: "mintlify[bot]" <109931778+mintlify[bot]@users.noreply.github.com> Date: Tue, 21 Apr 2026 09:25:51 +0000 Subject: [PATCH] Document WAL budget warnings in Diagnostics API and add slot recovery guidance Generated-By: mintlify-agent --- .../source-db/postgres-maintenance.mdx | 26 +++++++++++ maintenance-ops/self-hosting/diagnostics.mdx | 46 +++++++++++++++++-- 2 files changed, 68 insertions(+), 4 deletions(-) diff --git a/configuration/source-db/postgres-maintenance.mdx b/configuration/source-db/postgres-maintenance.mdx index 62298a01..cbe3adc2 100644 --- a/configuration/source-db/postgres-maintenance.mdx +++ b/configuration/source-db/postgres-maintenance.mdx @@ -34,6 +34,32 @@ select slot_name, pg_drop_replication_slot(slot_name) from pg_replication_slots Postgres prevents active slots from being dropped. If it does happen (e.g. while a PowerSync instance is disconnected), PowerSync would automatically re-create the slot, and restart replication. +### Recovering from an invalidated slot + +A replication slot becomes invalidated when its `wal_status` is `lost`. This happens when the WAL data needed by the slot has been removed — typically because the replication lag exceeded `max_slot_wal_keep_size`. + +When this occurs, you will see an error in the [Diagnostics API](/maintenance-ops/self-hosting/diagnostics) such as: + +> Replication slot powersync\_1\_xxxx was invalidated (reason: wal\_removed). Increase max\_slot\_wal\_keep\_size on the source database and delete the existing slot to recover. + +To recover: + +1. Increase `max_slot_wal_keep_size` on the source Postgres database to prevent re-occurrence. See the [production readiness guide](/maintenance-ops/production-readiness-guide#managing--monitoring-replication-lag) for sizing guidance. + +2. Drop the invalidated slot: + +```sql +SELECT pg_drop_replication_slot('powersync_1_xxxx'); +``` + +Replace `powersync_1_xxxx` with the actual slot name from the error message. + +3. Restart the PowerSync Service. It will create a new replication slot and begin replication from scratch. + +If the slot was invalidated during the initial snapshot (before it completed), the PowerSync Service will not automatically retry. You must drop the invalidated slot manually before the service can recover. + +If the invalidation reason is `idle_timeout` (Postgres 18+), the slot was invalidated due to inactivity. In this case, increase `idle_replication_slot_timeout` on the source database instead. + ### Maximum Replication Slots Postgres is configured with a maximum number of replication slots per server. Since each PowerSync instance uses one replication slot for replication and an additional one while deploying a new Sync Streams/Rules version, the maximum number of PowerSync instances connected to one Postgres server is equal to the maximum number of replication slots, minus 1\. diff --git a/maintenance-ops/self-hosting/diagnostics.mdx b/maintenance-ops/self-hosting/diagnostics.mdx index fbce499f..d65a45a3 100644 --- a/maintenance-ops/self-hosting/diagnostics.mdx +++ b/maintenance-ops/self-hosting/diagnostics.mdx @@ -6,8 +6,8 @@ description: "Use the PowerSync Diagnostics API to inspect replication status an All self-hosted PowerSync Service instances ship with a Diagnostics API. This API provides the following diagnostic information: -- Connections → Connected backend source database and any active errors associated with the connection. -- Active Sync Streams / Sync Rules → Currently deployed Sync Streams (or legacy Sync Rules) and its status. +- Connections — Connected backend source database and any active errors associated with the connection. +- Active Sync Streams / Sync Rules — Currently deployed Sync Streams (or legacy Sync Rules) and its status. ## CLI @@ -22,7 +22,7 @@ powersync status --output=json | jq '.connections[0]' ## Diagnostics API -# Configuration +### Configuration 1. To enable the Diagnostics API, specify an API token in your PowerSync YAML file: @@ -31,7 +31,7 @@ api: tokens: - YOUR_API_TOKEN ``` -Make sure to use a secure API token as part of this configuration +Make sure to use a secure API token as part of this configuration. 2. Restart the PowerSync Service. @@ -41,3 +41,41 @@ api: curl -X POST http://localhost:8080/api/admin/v1/diagnostics \ -H "Authorization: Bearer YOUR_API_TOKEN" ``` + +### Response + +The response includes connection details, WAL replication status, and any active errors or warnings. For Postgres connections, the `active_sync_rules.connections[]` object includes these fields related to WAL health: + +| Field | Description | +| --- | --- | +| `slot_name` | The name of the Postgres replication slot used by this sync rules version. | +| `initial_replication_done` | Whether the initial snapshot has completed. | +| `replication_lag_bytes` | Replication lag in bytes. | +| `wal_status` | The WAL status of the replication slot (`reserved`, `extended`, `unreserved`, or `lost`). | +| `safe_wal_size` | Remaining WAL budget in bytes before the slot risks invalidation. | +| `max_slot_wal_keep_size` | The configured `max_slot_wal_keep_size` value on the source Postgres database. | + +### WAL budget warnings + +The Diagnostics API monitors the WAL budget for Postgres replication slots. When the remaining WAL budget drops to 50% or below, a warning appears in the `active_sync_rules.errors[]` array: + +```json +{ + "level": "warning", + "message": "WAL budget is low: 25% remaining. The replication slot may be invalidated if WAL consumption continues at this rate. Consider increasing max_slot_wal_keep_size.", + "ts": "2025-08-26T15:51:49.746Z" +} +``` + +If the replication slot is invalidated (i.e. `wal_status` is `lost`), the error is reported through the `last_fatal_error` field on the sync rules status. This means you should monitor both the `errors` array and the sync rules status for replication issues. + + +For guidance on configuring `max_slot_wal_keep_size` and managing replication slots, see [Postgres maintenance](/configuration/source-db/postgres-maintenance). + + +### Replication lag warnings + +The Diagnostics API also checks replication lag based on the last checkpoint or keepalive timestamp: + +- A **warning** is raised if no replicated commit has been received in more than 5 minutes. +- A **fatal** error is raised if no replicated commit has been received in more than 15 minutes.