Make transaction log read paths lock-free under concurrency by harper-joseph · Pull Request #545 · HarperFast/rocksdb-js

harper-joseph · 2026-05-08T01:26:35Z

Eliminates dataSetsMutex and weak_ptr::lock() contention from the steady-state read path, unlocking high-QPS workloads with many concurrent subscribers (e.g., CDC replication).

Read-path changes (TransactionLogStore + TransactionLogHandle):

Per-handle file snapshot cache (cachedFiles + filesVersion). Handles refresh the snapshot only when the store's filesVersion atomic advances (rotation, registration, purge); steady-state reads walk the local snapshot lock-free.
Lock-free findPosition: walks cachedFiles newest-to-oldest, skipping files by their stored timestamp. Falls back to the store-level slow path only when the snapshot is empty.
Lock-free getLogFileSize fast path: reads logFile->size atomic via the cached snapshot. Falls through to the slow path only when sequenceNumber=0 or the file is in the snapshot but not yet opened.
Cached shared_ptr on the handle replaces the per-call weak_ptr::lock() CAS. Each read-method now does an isClosing.load() check instead. addEntry re-resolves to a fresh store and clears cachedFiles when isClosing is observed.
currentSequenceNumber is now atomic (was plain uint32_t) so handle fast-path readers can compare without acquiring dataSetsMutex.

Per-file index changes (TransactionLogFile):

Lock-free in-file timestamp index using a stable buffer + packed atomic state (low 32 bits = entry count, high 32 bits = position indexed up to). Replaces the std::map + indexMutex serialization. Acquire/release ordering on indexState publishes new entries safely to lock-free readers.
Slow-path index extension serializes only on indexExtendMutex (per-file), not the global dataSetsMutex.
Removed eager ensureIndexUpToDate at registerLogFile — saves up to maxFileSize/13 × 16 bytes per recovered file at startup. The lazy slow path handles the first reader instead.
Inlined extendIndexLocked into findPositionByTimestamp (only caller).

Test coverage:

Adds 9 new regression tests covering the cases the changes affected:

per-handle cache invalidation across many rotations
concurrent first-time readers on a freshly-opened (unindexed) log
multiple handles on the same log staying consistent
iterator resume across rotations
crash-free behavior under concurrent reads + purgeLogs(destroy:true)
queries after attempting to purge earlier files
...and more

Bench:

Adds benchmark/worker-transaction-log-read.bench.ts — 4 access patterns (bulk forward scan, bulk forward scan with concurrent writer, high-frequency short-range queries, cursor-advance tail scan with writer) at 8 workers, comparing rocksdb-js against lmdb with matched lazy-durability semantics (noSync) and Harper-realistic LMDB options (snapshot:false, numeric keys).

Eliminates dataSetsMutex and weak_ptr::lock() contention from the steady-state read path, unlocking high-QPS workloads with many concurrent subscribers (e.g., CDC replication). Read-path changes (TransactionLogStore + TransactionLogHandle): - Per-handle file snapshot cache (cachedFiles + filesVersion). Handles refresh the snapshot only when the store's filesVersion atomic advances (rotation, registration, purge); steady-state reads walk the local snapshot lock-free. - Lock-free findPosition: walks cachedFiles newest-to-oldest, skipping files by their stored timestamp. Falls back to the store-level slow path only when the snapshot is empty. - Lock-free getLogFileSize fast path: reads logFile->size atomic via the cached snapshot. Falls through to the slow path only when sequenceNumber=0 or the file is in the snapshot but not yet opened. - Cached shared_ptr<TransactionLogStore> on the handle replaces the per-call weak_ptr::lock() CAS. Each read-method now does an isClosing.load() check instead. addEntry re-resolves to a fresh store and clears cachedFiles when isClosing is observed. - currentSequenceNumber is now atomic (was plain uint32_t) so handle fast-path readers can compare without acquiring dataSetsMutex. Per-file index changes (TransactionLogFile): - Lock-free in-file timestamp index using a stable buffer + packed atomic state (low 32 bits = entry count, high 32 bits = position indexed up to). Replaces the std::map + indexMutex serialization. Acquire/release ordering on indexState publishes new entries safely to lock-free readers. - Slow-path index extension serializes only on indexExtendMutex (per-file), not the global dataSetsMutex. - Removed eager ensureIndexUpToDate at registerLogFile — saves up to maxFileSize/13 × 16 bytes per recovered file at startup. The lazy slow path handles the first reader instead. - Inlined extendIndexLocked into findPositionByTimestamp (only caller). Test coverage: Adds 9 new regression tests covering the cases the changes affected: - per-handle cache invalidation across many rotations - concurrent first-time readers on a freshly-opened (unindexed) log - multiple handles on the same log staying consistent - iterator resume across rotations - crash-free behavior under concurrent reads + purgeLogs(destroy:true) - queries after attempting to purge earlier files - ...and more Bench: Adds benchmark/worker-transaction-log-read.bench.ts — 4 access patterns (bulk forward scan, bulk forward scan with concurrent writer, high-frequency short-range queries, cursor-advance tail scan with writer) at 8 workers, comparing rocksdb-js against lmdb with matched lazy-durability semantics (noSync) and Harper-realistic LMDB options (snapshot:false, numeric keys). Headline impact (8 workers, 3-run averages): Short-range queries (1k iterators × 8 workers): baseline: 114 hz | optimized: 2,576 hz (22.6× speedup, 5.0× faster than lmdb) Bulk forward scan with concurrent writer: baseline: 2,248 hz | optimized: 2,502 hz (+11%, 26.6× faster than lmdb) Bulk forward scan, no writes: baseline: 3,136 hz | optimized: 3,168 hz (+1%, 38× faster than lmdb) The high-QPS short-range pattern is most representative of CDC subscriber polling — that workload moves from a regression vs lmdb to a 5× lead. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-08T01:31:31Z

📊 Benchmark Results

get-sync.bench.ts

getSync() > random keys - small key size (100 records)

Implementation	Rank	Operations/sec	Mean (ms)	Min (ms)	Max (ms)	RME (%)	Samples
🥇 lmdb	1	24.38K ops/sec	41.02	39.74	694.525	0.116	121,888
🥈 rocksdb	2	12.54K ops/sec	79.77	77.37	22,482.244	0.890	62,679

getSync() > sequential keys - small key size (100 records)

Implementation	Rank	Operations/sec	Mean (ms)	Min (ms)	Max (ms)	RME (%)	Samples
🥇 lmdb	1	28.38K ops/sec	35.23	34.16	754.692	0.106	141,912
🥈 rocksdb	2	13.18K ops/sec	75.87	74.78	567.205	0.048	65,900

ranges.bench.ts

getRange() > small range (100 records, 50 range)

Implementation	Rank	Operations/sec	Mean (ms)	Min (ms)	Max (ms)	RME (%)	Samples
🥇 lmdb	1	26.42K ops/sec	37.84	35.23	1,628.831	0.281	132,123
🥈 rocksdb	2	17.16K ops/sec	58.27	51.77	2,132.248	0.144	85,811

realistic-load.bench.ts

Realistic write load with workers > write variable records with transaction log

Implementation	Rank	Operations/sec	Mean (ms)	Min (ms)	Max (ms)	RME (%)	Samples
🥇 rocksdb	1	197.30 ops/sec	5,068.55	71.28	141,022.184	38.61	395
🥈 lmdb	2	26.43 ops/sec	37,831.511	423.412	1,191,509.07	136.374	64.00

transaction-log.bench.ts

Transaction log > read 100 iterators while write log with 100 byte records

Implementation	Rank	Operations/sec	Mean (ms)	Min (ms)	Max (ms)	RME (%)	Samples
🥇 rocksdb	1	35.38K ops/sec	28.26	12.96	14,199.917	0.601	176,903
🥈 lmdb	2	439.76 ops/sec	2,273.972	175.537	28,497.843	1.65	2,199

Transaction log > read one entry from random position from log with 1000 100 byte records

Implementation	Rank	Operations/sec	Mean (ms)	Min (ms)	Max (ms)	RME (%)	Samples
🥇 rocksdb	1	741.86K ops/sec	1.35	1.18	3,290.04	0.146	3,709,315
🥈 lmdb	2	423.72K ops/sec	2.36	1.22	2,841.055	0.316	2,118,582

worker-put-sync.bench.ts

putSync() > random keys - small key size (100 records, 10 workers)

Implementation	Rank	Operations/sec	Mean (ms)	Min (ms)	Max (ms)	RME (%)	Samples
🥇 rocksdb	1	862.62 ops/sec	1,159.252	992.12	1,806.368	0.295	1,726
🥈 lmdb	2	1.17 ops/sec	856,476.561	812,851.925	883,536.324	1.75	10.00

worker-transaction-log-read.bench.ts

Transaction log read access patterns (8 workers) > Bulk forward scan, no writes: 8 workers each scan ~8000 entries

Implementation	Rank	Operations/sec	Mean (ms)	Min (ms)	Max (ms)	RME (%)	Samples
🥇 rocksdb	1	966.66 ops/sec	1,034.493	933.384	2,499.365	0.337	1,934
🥈 lmdb	2	100.10 ops/sec	9,990.446	7,142.306	14,021.624	1.61	201

Transaction log read access patterns (8 workers) > Bulk forward scan with 1 concurrent writer (7 readers)

Implementation	Rank	Operations/sec	Mean (ms)	Min (ms)	Max (ms)	RME (%)	Samples
🥇 rocksdb	1	1.01K ops/sec	990.958	815.75	1,474.379	0.472	2,019
🥈 lmdb	2	107.45 ops/sec	9,306.826	6,715.124	12,684.923	1.67	215

Transaction log read access patterns (8 workers) > Short-range queries: 8 workers each open 1000 iterators per tick

Implementation	Rank	Operations/sec	Mean (ms)	Min (ms)	Max (ms)	RME (%)	Samples
🥇 rocksdb	1	1.16K ops/sec	861.938	679.434	1,581.451	1.11	2,321
🥈 lmdb	2	434.90 ops/sec	2,299.392	1,406.1	5,185.222	1.43	870

Transaction log read access patterns (8 workers) > Cursor-advance tail scan with 1 concurrent writer (7 readers, 50 writes/tick)

Implementation	Rank	Operations/sec	Mean (ms)	Min (ms)	Max (ms)	RME (%)	Samples
🥇 lmdb	1	801.67 ops/sec	1,247.402	1,041.197	17,238.003	2.90	1,604
🥈 rocksdb	2	501.69 ops/sec	1,993.266	1,542.796	3,387.585	0.725	1,004

worker-transaction-log.bench.ts

Transaction log with workers > write log with 100 byte records

Implementation	Rank	Operations/sec	Mean (ms)	Min (ms)	Max (ms)	RME (%)	Samples
🥇 rocksdb	1	18.19K ops/sec	54.98	30.20	558.912	0.499	36,377
🥈 lmdb	2	817.20 ops/sec	1,223.685	293.444	11,213.584	5.34	1,635

Results from commit 6ba95df

cb1kenobi · 2026-05-08T04:56:52Z

+	 * to a fresh store if this one has been marked closing.
 	 */
-	std::weak_ptr<TransactionLogStore> store;
+	std::shared_ptr<TransactionLogStore> store;


I'm skeptical that this will work. You are now bounding the life of the TransactionLogStore to the life of the TransactionLogHandle. When you call db.purgeLogs({ destroy: true }), it won't be able to fully close a TransactionLogStore and almost certainly the files will fail to be cleaned up on Windows.

The current design allows the TransactionLogStore to be destroyed while TransactionLogHandles reference it. The TransactionLogHandle life is bound to V8's garbage collection and TransactionLogStore is not.

With that said, there were some recent race conditions that where fixed by adding isClosing checks. I suppose it's possible the initial reason for the weak_ptr no longer exists.

cb1kenobi · 2026-05-08T04:57:58Z

Ignore the failed tests. They are failing due to pnpm 11 dropping Node 20 and Bun/Deno not supporting sqlite. I fixed it in #548 and so it'll fix your tests when it lands.

kriszyp

I am curious what is meant by "read path"? The most common "read path" in Harper for most application is long-lived iterators, used by replication. And the dominant iteration involves zero native/C++ calls, it is entirely in JavaScript. And this is intentional because it basically guarantees that the dominant read path not only does not use locks, but it can not use locks. And that's why we get many millions of iterations per second. Iteration is faster than is even possible with native calls, much less mutexes, I believe.
So what read path are we are talking about? Is this more for random access queries on the transaction log (this is more common with MQTT applications)?

harper-joseph linked an issue May 8, 2026 that may be closed by this pull request

Explore using atomics to reduce mutex locks #462

Open

Merge branch 'main' into transaction-log-lock-free-reads

20c7c0d

cb1kenobi reviewed May 8, 2026

View reviewed changes

kriszyp reviewed May 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make transaction log read paths lock-free under concurrency#545

Make transaction log read paths lock-free under concurrency#545
harper-joseph wants to merge 2 commits into
mainfrom
transaction-log-lock-free-reads

harper-joseph commented May 8, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 8, 2026 •

edited

Loading

Uh oh!

cb1kenobi May 8, 2026

Uh oh!

cb1kenobi commented May 8, 2026 •

edited

Loading

Uh oh!

kriszyp left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

harper-joseph commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Benchmark Results

get-sync.bench.ts

getSync() > random keys - small key size (100 records)

getSync() > sequential keys - small key size (100 records)

ranges.bench.ts

getRange() > small range (100 records, 50 range)

realistic-load.bench.ts

Realistic write load with workers > write variable records with transaction log

transaction-log.bench.ts

Transaction log > read 100 iterators while write log with 100 byte records

Transaction log > read one entry from random position from log with 1000 100 byte records

worker-put-sync.bench.ts

putSync() > random keys - small key size (100 records, 10 workers)

worker-transaction-log-read.bench.ts

Transaction log read access patterns (8 workers) > Bulk forward scan, no writes: 8 workers each scan ~8000 entries

Transaction log read access patterns (8 workers) > Bulk forward scan with 1 concurrent writer (7 readers)

Transaction log read access patterns (8 workers) > Short-range queries: 8 workers each open 1000 iterators per tick

Transaction log read access patterns (8 workers) > Cursor-advance tail scan with 1 concurrent writer (7 readers, 50 writes/tick)

worker-transaction-log.bench.ts

Transaction log with workers > write log with 100 byte records

Uh oh!

cb1kenobi May 8, 2026

Choose a reason for hiding this comment

Uh oh!

cb1kenobi commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kriszyp left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

harper-joseph commented May 8, 2026 •

edited

Loading

github-actions Bot commented May 8, 2026 •

edited

Loading

cb1kenobi commented May 8, 2026 •

edited

Loading

kriszyp left a comment •

edited

Loading