From 9249c389061f5a8b7d8ebc88f56e76f4f027772e Mon Sep 17 00:00:00 2001
From: "mintlify[bot]" <109931778+mintlify[bot]@users.noreply.github.com>
Date: Tue, 21 Apr 2026 09:25:51 +0000
Subject: [PATCH] Document WAL budget warnings in Diagnostics API and add slot
 recovery guidance

Generated-By: mintlify-agent
---
 .../source-db/postgres-maintenance.mdx        | 26 +++++++++++
 maintenance-ops/self-hosting/diagnostics.mdx  | 46 +++++++++++++++++--
 2 files changed, 68 insertions(+), 4 deletions(-)
diff --git a/configuration/source-db/postgres-maintenance.mdx b/configuration/source-db/postgres-maintenance.mdx
index 62298a01..cbe3adc2 100644
--- a/configuration/source-db/postgres-maintenance.mdx
+++ b/configuration/source-db/postgres-maintenance.mdx
@@ -34,6 +34,32 @@ select slot_name, pg_drop_replication_slot(slot_name) from pg_replication_slots
 
 Postgres prevents active slots from being dropped. If it does happen (e.g. while a PowerSync instance is disconnected), PowerSync would automatically re-create the slot, and restart replication.
 
+### Recovering from an invalidated slot
+
+A replication slot becomes invalidated when its `wal_status` is `lost`. This happens when the WAL data needed by the slot has been removed — typically because the replication lag exceeded `max_slot_wal_keep_size`.
+
+When this occurs, you will see an error in the [Diagnostics API](/maintenance-ops/self-hosting/diagnostics) such as:
+
+> Replication slot powersync\_1\_xxxx was invalidated (reason: wal\_removed). Increase max\_slot\_wal\_keep\_size on the source database and delete the existing slot to recover.
+
+To recover:
+
+1. Increase `max_slot_wal_keep_size` on the source Postgres database to prevent re-occurrence. See the [production readiness guide](/maintenance-ops/production-readiness-guide#managing--monitoring-replication-lag) for sizing guidance.
+
+2. Drop the invalidated slot:
+
+```sql
+SELECT pg_drop_replication_slot('powersync_1_xxxx');
+```
+
+Replace `powersync_1_xxxx` with the actual slot name from the error message.
+
+3. Restart the PowerSync Service. It will create a new replication slot and begin replication from scratch.
+
+<Note>If the slot was invalidated during the initial snapshot (before it completed), the PowerSync Service will not automatically retry. You must drop the invalidated slot manually before the service can recover.</Note>
+
+If the invalidation reason is `idle_timeout` (Postgres 18+), the slot was invalidated due to inactivity. In this case, increase `idle_replication_slot_timeout` on the source database instead.
+
 ### Maximum Replication Slots
 
 Postgres is configured with a maximum number of replication slots per server. Since each PowerSync instance uses one replication slot for replication and an additional one while deploying a new Sync Streams/Rules version, the maximum number of PowerSync instances connected to one Postgres server is equal to the maximum number of replication slots, minus 1\.
diff --git a/maintenance-ops/self-hosting/diagnostics.mdx b/maintenance-ops/self-hosting/diagnostics.mdx
index fbce499f..d65a45a3 100644
--- a/maintenance-ops/self-hosting/diagnostics.mdx
+++ b/maintenance-ops/self-hosting/diagnostics.mdx
@@ -6,8 +6,8 @@ description: "Use the PowerSync Diagnostics API to inspect replication status an
 All self-hosted PowerSync Service instances ship with a Diagnostics API.
 This API provides the following diagnostic information:
 
-- Connections → Connected backend source database and any active errors associated with the connection.
-- Active Sync Streams / Sync Rules → Currently deployed Sync Streams (or legacy Sync Rules) and its status.
+- Connections — Connected backend source database and any active errors associated with the connection.
+- Active Sync Streams / Sync Rules — Currently deployed Sync Streams (or legacy Sync Rules) and its status.
 
 ## CLI
 
@@ -22,7 +22,7 @@ powersync status --output=json | jq '.connections[0]'
 
 ## Diagnostics API
 
-# Configuration
+### Configuration
 
 1. To enable the Diagnostics API, specify an API token in your PowerSync YAML file:
 
@@ -31,7 +31,7 @@ api:
   tokens:
     - YOUR_API_TOKEN
 ```
-<Warning>Make sure to use a secure API token as part of this configuration</Warning>
+<Warning>Make sure to use a secure API token as part of this configuration.</Warning>
 
 2. Restart the PowerSync Service.
 
@@ -41,3 +41,41 @@ api:
 curl -X POST http://localhost:8080/api/admin/v1/diagnostics \
   -H "Authorization: Bearer YOUR_API_TOKEN"
 ```
+
+### Response
+
+The response includes connection details, WAL replication status, and any active errors or warnings. For Postgres connections, the `active_sync_rules.connections[]` object includes these fields related to WAL health:
+
+| Field | Description |
+| --- | --- |
+| `slot_name` | The name of the Postgres replication slot used by this sync rules version. |
+| `initial_replication_done` | Whether the initial snapshot has completed. |
+| `replication_lag_bytes` | Replication lag in bytes. |
+| `wal_status` | The WAL status of the replication slot (`reserved`, `extended`, `unreserved`, or `lost`). |
+| `safe_wal_size` | Remaining WAL budget in bytes before the slot risks invalidation. |
+| `max_slot_wal_keep_size` | The configured `max_slot_wal_keep_size` value on the source Postgres database. |
+
+### WAL budget warnings
+
+The Diagnostics API monitors the WAL budget for Postgres replication slots. When the remaining WAL budget drops to 50% or below, a warning appears in the `active_sync_rules.errors[]` array:
+
+```json
+{
+  "level": "warning",
+  "message": "WAL budget is low: 25% remaining. The replication slot may be invalidated if WAL consumption continues at this rate. Consider increasing max_slot_wal_keep_size.",
+  "ts": "2025-08-26T15:51:49.746Z"
+}
+```
+
+If the replication slot is invalidated (i.e. `wal_status` is `lost`), the error is reported through the `last_fatal_error` field on the sync rules status. This means you should monitor both the `errors` array and the sync rules status for replication issues.
+
+<Tip>
+For guidance on configuring `max_slot_wal_keep_size` and managing replication slots, see [Postgres maintenance](/configuration/source-db/postgres-maintenance).
+</Tip>
+
+### Replication lag warnings
+
+The Diagnostics API also checks replication lag based on the last checkpoint or keepalive timestamp:
+
+- A **warning** is raised if no replicated commit has been received in more than 5 minutes.
+- A **fatal** error is raised if no replicated commit has been received in more than 15 minutes.