Make transaction log read paths lock-free under concurrency#545
Make transaction log read paths lock-free under concurrency#545harper-joseph wants to merge 2 commits into
Conversation
Eliminates dataSetsMutex and weak_ptr::lock() contention from the steady-state
read path, unlocking high-QPS workloads with many concurrent subscribers
(e.g., CDC replication).
Read-path changes (TransactionLogStore + TransactionLogHandle):
- Per-handle file snapshot cache (cachedFiles + filesVersion). Handles refresh
the snapshot only when the store's filesVersion atomic advances (rotation,
registration, purge); steady-state reads walk the local snapshot lock-free.
- Lock-free findPosition: walks cachedFiles newest-to-oldest, skipping files
by their stored timestamp. Falls back to the store-level slow path only
when the snapshot is empty.
- Lock-free getLogFileSize fast path: reads logFile->size atomic via the
cached snapshot. Falls through to the slow path only when sequenceNumber=0
or the file is in the snapshot but not yet opened.
- Cached shared_ptr<TransactionLogStore> on the handle replaces the per-call
weak_ptr::lock() CAS. Each read-method now does an isClosing.load() check
instead. addEntry re-resolves to a fresh store and clears cachedFiles when
isClosing is observed.
- currentSequenceNumber is now atomic (was plain uint32_t) so handle
fast-path readers can compare without acquiring dataSetsMutex.
Per-file index changes (TransactionLogFile):
- Lock-free in-file timestamp index using a stable buffer + packed atomic
state (low 32 bits = entry count, high 32 bits = position indexed up to).
Replaces the std::map + indexMutex serialization. Acquire/release ordering
on indexState publishes new entries safely to lock-free readers.
- Slow-path index extension serializes only on indexExtendMutex (per-file),
not the global dataSetsMutex.
- Removed eager ensureIndexUpToDate at registerLogFile — saves up to
maxFileSize/13 × 16 bytes per recovered file at startup. The lazy slow
path handles the first reader instead.
- Inlined extendIndexLocked into findPositionByTimestamp (only caller).
Test coverage:
Adds 9 new regression tests covering the cases the changes affected:
- per-handle cache invalidation across many rotations
- concurrent first-time readers on a freshly-opened (unindexed) log
- multiple handles on the same log staying consistent
- iterator resume across rotations
- crash-free behavior under concurrent reads + purgeLogs(destroy:true)
- queries after attempting to purge earlier files
- ...and more
Bench:
Adds benchmark/worker-transaction-log-read.bench.ts — 4 access patterns
(bulk forward scan, bulk forward scan with concurrent writer, high-frequency
short-range queries, cursor-advance tail scan with writer) at 8 workers,
comparing rocksdb-js against lmdb with matched lazy-durability semantics
(noSync) and Harper-realistic LMDB options (snapshot:false, numeric keys).
Headline impact (8 workers, 3-run averages):
Short-range queries (1k iterators × 8 workers):
baseline: 114 hz | optimized: 2,576 hz (22.6× speedup, 5.0× faster than lmdb)
Bulk forward scan with concurrent writer:
baseline: 2,248 hz | optimized: 2,502 hz (+11%, 26.6× faster than lmdb)
Bulk forward scan, no writes:
baseline: 3,136 hz | optimized: 3,168 hz (+1%, 38× faster than lmdb)
The high-QPS short-range pattern is most representative of CDC subscriber
polling — that workload moves from a regression vs lmdb to a 5× lead.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
📊 Benchmark Resultsget-sync.bench.tsgetSync() > random keys - small key size (100 records)
getSync() > sequential keys - small key size (100 records)
ranges.bench.tsgetRange() > small range (100 records, 50 range)
realistic-load.bench.tsRealistic write load with workers > write variable records with transaction log
transaction-log.bench.tsTransaction log > read 100 iterators while write log with 100 byte records
Transaction log > read one entry from random position from log with 1000 100 byte records
worker-put-sync.bench.tsputSync() > random keys - small key size (100 records, 10 workers)
worker-transaction-log-read.bench.tsTransaction log read access patterns (8 workers) > Bulk forward scan, no writes: 8 workers each scan ~8000 entries
Transaction log read access patterns (8 workers) > Bulk forward scan with 1 concurrent writer (7 readers)
Transaction log read access patterns (8 workers) > Short-range queries: 8 workers each open 1000 iterators per tick
Transaction log read access patterns (8 workers) > Cursor-advance tail scan with 1 concurrent writer (7 readers, 50 writes/tick)
worker-transaction-log.bench.tsTransaction log with workers > write log with 100 byte records
Results from commit 6ba95df |
| * to a fresh store if this one has been marked closing. | ||
| */ | ||
| std::weak_ptr<TransactionLogStore> store; | ||
| std::shared_ptr<TransactionLogStore> store; |
There was a problem hiding this comment.
I'm skeptical that this will work. You are now bounding the life of the TransactionLogStore to the life of the TransactionLogHandle. When you call db.purgeLogs({ destroy: true }), it won't be able to fully close a TransactionLogStore and almost certainly the files will fail to be cleaned up on Windows.
The current design allows the TransactionLogStore to be destroyed while TransactionLogHandles reference it. The TransactionLogHandle life is bound to V8's garbage collection and TransactionLogStore is not.
With that said, there were some recent race conditions that where fixed by adding isClosing checks. I suppose it's possible the initial reason for the weak_ptr no longer exists.
|
Ignore the failed tests. They are failing due to pnpm 11 dropping Node 20 and Bun/Deno not supporting sqlite. I fixed it in #548 and so it'll fix your tests when it lands. |
There was a problem hiding this comment.
I am curious what is meant by "read path"? The most common "read path" in Harper for most application is long-lived iterators, used by replication. And the dominant iteration involves zero native/C++ calls, it is entirely in JavaScript. And this is intentional because it basically guarantees that the dominant read path not only does not use locks, but it can not use locks. And that's why we get many millions of iterations per second. Iteration is faster than is even possible with native calls, much less mutexes, I believe.
So what read path are we are talking about? Is this more for random access queries on the transaction log (this is more common with MQTT applications)?
Eliminates dataSetsMutex and weak_ptr::lock() contention from the steady-state read path, unlocking high-QPS workloads with many concurrent subscribers (e.g., CDC replication).
Read-path changes (TransactionLogStore + TransactionLogHandle):
Per-file index changes (TransactionLogFile):
Test coverage:
Adds 9 new regression tests covering the cases the changes affected:
Bench:
Adds benchmark/worker-transaction-log-read.bench.ts — 4 access patterns (bulk forward scan, bulk forward scan with concurrent writer, high-frequency short-range queries, cursor-advance tail scan with writer) at 8 workers, comparing rocksdb-js against lmdb with matched lazy-durability semantics (noSync) and Harper-realistic LMDB options (snapshot:false, numeric keys).