fix: use unique partition keys in QueryReturnTypesIT to avoid LWT contention#882
Conversation
d2bd682 to
0704c1f
Compare
|
Two questions:
|
This is acutally funny, it happened because query are run on the same PK and run in parallel, so it is regular LWT congestion . |
|
Contention and timeout from just 2 queries? Wow. I did not expect LWT to be THAT slow |
0704c1f to
6c725b1
Compare
|
@Lorak-mmk, I've reworked fix following @dkropachev's feedback |
There was a problem hiding this comment.
Pull request overview
This PR updates the QueryReturnTypesIT integration test to avoid LWT (Paxos) contention when tests are executed in parallel, by ensuring each test instance operates on a distinct partition key.
Changes:
- Introduced a per-test unique partition key generated from a static
AtomicInteger. - Updated all DAO calls/assertions to use the per-test
testIdinstead of hardcoded IDs. - Updated “not found” probes to use a guaranteed-unassigned negative ID derived from
testId.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Why only two ? |
Actually, it is not what happening, tests in the test suit running one by one, but different test suits are running in parallel. |
Note
Corresponds to scylladb/scylla-java-driver-matrix#142.
Problem
QueryReturnTypesITis annotated@Category(ParallelizableTests.class)and all test methods were using the same hardcoded partition key (id=1). On Scylla with tablets, initial LWT queries to the same partition key can be routed to random nodes, causing Paxos contention across parallel test threads and resulting inWriteTimeoutException.Fix
Assign each test method instance a unique partition key via a static
AtomicIntegercounter incremented in@Before, so no two concurrently running tests contend on the same partition. The "not found" probe previously usingid=2now uses-(testId + 1), which is always negative and therefore guaranteed to never be assigned by the counter.