Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions configuration/source-db/postgres-maintenance.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,32 @@ select slot_name, pg_drop_replication_slot(slot_name) from pg_replication_slots

Postgres prevents active slots from being dropped. If it does happen (e.g. while a PowerSync instance is disconnected), PowerSync would automatically re-create the slot, and restart replication.

### Recovering from an invalidated slot

A replication slot becomes invalidated when its `wal_status` is `lost`. This happens when the WAL data needed by the slot has been removed — typically because the replication lag exceeded `max_slot_wal_keep_size`.

When this occurs, you will see an error in the [Diagnostics API](/maintenance-ops/self-hosting/diagnostics) such as:

> Replication slot powersync\_1\_xxxx was invalidated (reason: wal\_removed). Increase max\_slot\_wal\_keep\_size on the source database and delete the existing slot to recover.

To recover:

1. Increase `max_slot_wal_keep_size` on the source Postgres database to prevent re-occurrence. See the [production readiness guide](/maintenance-ops/production-readiness-guide#managing--monitoring-replication-lag) for sizing guidance.

2. Drop the invalidated slot:

```sql
SELECT pg_drop_replication_slot('powersync_1_xxxx');
```

Replace `powersync_1_xxxx` with the actual slot name from the error message.

3. Restart the PowerSync Service. It will create a new replication slot and begin replication from scratch.

<Note>If the slot was invalidated during the initial snapshot (before it completed), the PowerSync Service will not automatically retry. You must drop the invalidated slot manually before the service can recover.</Note>

If the invalidation reason is `idle_timeout` (Postgres 18+), the slot was invalidated due to inactivity. In this case, increase `idle_replication_slot_timeout` on the source database instead.

### Maximum Replication Slots

Postgres is configured with a maximum number of replication slots per server. Since each PowerSync instance uses one replication slot for replication and an additional one while deploying a new Sync Streams/Rules version, the maximum number of PowerSync instances connected to one Postgres server is equal to the maximum number of replication slots, minus 1\.
Expand Down
46 changes: 42 additions & 4 deletions maintenance-ops/self-hosting/diagnostics.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ description: "Use the PowerSync Diagnostics API to inspect replication status an
All self-hosted PowerSync Service instances ship with a Diagnostics API.
This API provides the following diagnostic information:

- Connections Connected backend source database and any active errors associated with the connection.
- Active Sync Streams / Sync Rules Currently deployed Sync Streams (or legacy Sync Rules) and its status.
- Connections Connected backend source database and any active errors associated with the connection.
- Active Sync Streams / Sync Rules Currently deployed Sync Streams (or legacy Sync Rules) and its status.

## CLI

Expand All @@ -22,7 +22,7 @@ powersync status --output=json | jq '.connections[0]'

## Diagnostics API

# Configuration
### Configuration

1. To enable the Diagnostics API, specify an API token in your PowerSync YAML file:

Expand All @@ -31,7 +31,7 @@ api:
tokens:
- YOUR_API_TOKEN
```
<Warning>Make sure to use a secure API token as part of this configuration</Warning>
<Warning>Make sure to use a secure API token as part of this configuration.</Warning>

2. Restart the PowerSync Service.

Expand All @@ -41,3 +41,41 @@ api:
curl -X POST http://localhost:8080/api/admin/v1/diagnostics \
-H "Authorization: Bearer YOUR_API_TOKEN"
```

### Response

The response includes connection details, WAL replication status, and any active errors or warnings. For Postgres connections, the `active_sync_rules.connections[]` object includes these fields related to WAL health:

| Field | Description |
| --- | --- |
| `slot_name` | The name of the Postgres replication slot used by this sync rules version. |
| `initial_replication_done` | Whether the initial snapshot has completed. |
| `replication_lag_bytes` | Replication lag in bytes. |
| `wal_status` | The WAL status of the replication slot (`reserved`, `extended`, `unreserved`, or `lost`). |
| `safe_wal_size` | Remaining WAL budget in bytes before the slot risks invalidation. |
| `max_slot_wal_keep_size` | The configured `max_slot_wal_keep_size` value on the source Postgres database. |

### WAL budget warnings

The Diagnostics API monitors the WAL budget for Postgres replication slots. When the remaining WAL budget drops to 50% or below, a warning appears in the `active_sync_rules.errors[]` array:

```json
{
"level": "warning",
"message": "WAL budget is low: 25% remaining. The replication slot may be invalidated if WAL consumption continues at this rate. Consider increasing max_slot_wal_keep_size.",
"ts": "2025-08-26T15:51:49.746Z"
}
```

If the replication slot is invalidated (i.e. `wal_status` is `lost`), the error is reported through the `last_fatal_error` field on the sync rules status. This means you should monitor both the `errors` array and the sync rules status for replication issues.

<Tip>
For guidance on configuring `max_slot_wal_keep_size` and managing replication slots, see [Postgres maintenance](/configuration/source-db/postgres-maintenance).
</Tip>

### Replication lag warnings

The Diagnostics API also checks replication lag based on the last checkpoint or keepalive timestamp:

- A **warning** is raised if no replicated commit has been received in more than 5 minutes.
- A **fatal** error is raised if no replicated commit has been received in more than 15 minutes.