feat: optimize system table queries with column projection (DRIVER-368)#862
Conversation
|
@nikagra please check why cicd is failing |
There was a problem hiding this comment.
Pull request overview
This PR backports the 4.x “system table column projection” optimization to the 3.x driver’s ControlConnection, caching discovered system table columns (via initial SELECT *) and using projected SELECT col1, col2... for subsequent queries to reduce bytes/deserialization work.
Changes:
- Add
*_COLUMNS_OF_INTERESTconstants, projected-column caches, and helper methods (intersectWithNeeded,buildProjectedQuery) inControlConnection. - Update control connection system table queries to use projections once caches are warmed, and reset caches on reconnect / peers_v2 fallback.
- Extend Scassandra priming to include projected-query primes and add unit tests covering the new helpers/caches.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| driver-core/src/main/java/com/datastax/driver/core/ControlConnection.java | Adds column-interest sets, cache fields, cache reset, and switches system table queries to projected form after discovery. |
| driver-core/src/test/java/com/datastax/driver/core/ScassandraCluster.java | Re-primes after restart and primes projected queries to match the driver’s new projected system-table queries. |
| driver-core/src/test/java/com/datastax/driver/core/ControlConnectionUnitTest.java | New pure unit tests for projection helpers/constants and cache-field declarations. |
| driver-core/src/test/java/com/datastax/driver/core/ControlConnectionTest.java | Resets column caches when Scassandra primes are cleared to avoid projected-query misses. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
0bf9092 to
4f9a052
Compare
|
All CI failures have been resolved. The root issues were: projected WHERE-clause queries not primed in Scassandra for non-restarted nodes (fixed by using |
c37c98f to
4f9a052
Compare
ab6f00c to
367ff74
Compare
dkropachev
left a comment
There was a problem hiding this comment.
Looks great, can I ask you to run one experiment, could you please task ask AI to extract this new logic and variables and store them on a separate class, it should make code cleaner.
35d66c5 to
2a7e9d7
Compare
|
@nikagra , looks great, squash commits please |
On the first query to each system table (system.local, system.peers, system.peers_v2) the driver sends SELECT * to discover the server's schema. The result is intersected with the set of columns the driver actually reads and cached in SystemColumnProjection. Subsequent queries project only those columns, reducing bytes on the wire and deserialization work. Key design decisions: - SystemColumnProjection owns a SystemTable enum (LOCAL, PEERS, PEERS_V2) and three unified methods: query(SystemTable), populate(SystemTable, ResultSet), and hook(SystemTable, DefaultResultSetFuture). - populate() is called inside if (row != null) guards for WHERE-clause single-row lookups: an empty result still carries ColumnDefinitions in the metadata, so the cache must not be warmed from it. - hook() is used for the async system.peers full-scan path where the result always reflects the server schema regardless of row count. - Column caches are reset on reconnection and on InvalidQueryException so a server schema change causes the next query to re-discover columns via SELECT *. - Projected column lists are sorted alphabetically for deterministic query strings; ScassandraCluster primes matching projected queries alongside SELECT * primes. - Unit tests in ControlConnectionUnitTest cover intersectWithNeeded(), buildProjectedQuery(), hook() success/failure, and cache field modifiers.
2a7e9d7 to
09dc403
Compare
Closes https://scylladb.atlassian.net/browse/DRIVER-368
Backport of the same optimization done for the 4.x driver (DefaultTopologyMonitor), applied here to ControlConnection in the 3.x driver.
On the first connection to each system table (system.local, system.peers, system.peers_v2) the driver issues
SELECT *to discover which columns the server exposes. It caches the intersection of those columns with a driver-internal*_COLUMNS_OF_INTERESTset. Subsequent queries project only those columns, reducing bytes on the wire and deserialization work.Changes:
LOCAL/PEERS/PEERS_V2_COLUMNS_OF_INTERESTImmutableSet<String>constantsvolatile Set<String>cache fields (null= uninitialized sentinel)intersectWithNeeded()andbuildProjectedQuery()static helpers (@VisibleForTesting)setNewConnection()and peers_v2 fallback pathrefreshNodeListAndTokenMap(),selectPeersFuture(), andfetchNodeInfo()(system.local only)fetchNodeInfo()usesSELECT *for peer WHERE-clause lookups (single-row reconnection path) to stay compatible with Scassandra primes on non-restarted nodesControlConnectionUnitTest.java: 16 pure unit tests, no cluster required