fix: connect via all DNS IPs when first contact point is non-responsive (DRIVER-201)#865
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses DRIVER-201 by improving initial contact point connection behavior when RESOLVE_CONTACT_POINTS=false, ensuring that if a hostname resolves to multiple IPs and the first is unreachable, the driver can fall back to subsequent IPs instead of failing immediately.
Changes:
- Refactors
ChannelFactoryconnection logic to expand unresolved hostnames to all DNS IPs at connection time and attempt them sequentially. - Introduces a
ConnectRequestparameter object and decomposes connection steps into smaller methods. - Adds an integration test covering the “first DNS A record is unreachable” scenario.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
core/src/main/java/com/datastax/oss/driver/internal/core/channel/ChannelFactory.java |
Expands unresolved hostnames to all resolved IPs at connect time and adds sequential fallback across addresses. |
integration-tests/src/test/java/com/datastax/oss/driver/core/resolver/MockResolverIT.java |
Adds an integration test to validate successful connection when the first DNS entry is non-responsive. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ve (DRIVER-201)
When RESOLVE_CONTACT_POINTS=false (the default), a hostname is stored as a
single unresolved InetSocketAddress. At connection time Netty's bootstrap
called InetAddress.getByName(), returning only the first IP. If that IP was
non-responsive the driver raised AllNodesFailedException with no fallback.
Refactor ChannelFactory to:
- Introduce ConnectRequest parameter object, eliminating the 9-parameter
private connect() method.
- Decompose the connection logic into three single-responsibility methods:
connect(ConnectRequest) – resolves hostname via getAllByName() and
builds a candidate list of all IPs;
tryNextAddress(...) – iterates candidates, falling back to the
next IP on failure;
connectToAddress(...) – performs a single Netty connect and handles
protocol-version negotiation on the same IP.
- DNS resolution is still deferred to connection time, preserving the
dynamic-DNS semantics of RESOLVE_CONTACT_POINTS=false.
Add integration test should_connect_when_first_dns_entry_is_non_responsive
to MockResolverIT that maps the first DNS entry to a non-existent IP and
asserts the session opens successfully via the subsequent entries.
89e53ea to
7da0919
Compare
|
@nikagra , you solve it too deep in the driver, probably the best place is |
|
@dkropachev Thanks for the suggestion. I considered your approach, but the issue description identifies the root cause specifically at |
There are two instances for this issue and I would recomended them fixed separately:
Here is what should be done to address
In regards of the And implement resolveAll() for endpoints that resolved from DNS, such as ClientRoutesEndPoint and SniEndPoint. |
Problem
When
RESOLVE_CONTACT_POINTS=false(the default), a contact point hostname is stored as a single unresolvedInetSocketAddress. At connection time,ChannelFactorypassed this address directly to Netty'sbootstrap.connect(), which internally calledInetAddress.getByName()— returning only the first IP for the hostname. If that IP was non-responsive, the driver raisedAllNodesFailedExceptionwith no fallback to other IPs the hostname might resolve to.This is particularly impactful in dynamic DNS environments where a hostname can map to multiple nodes and the first one may be temporarily unavailable.
Fixes DRIVER-201.
Changes
ChannelFactory.javaRefactored the connection logic using the Parameter Object pattern and SRP decomposition:
ConnectRequestinner class: bundles all per-connection-attempt state, eliminating the unwieldy 9-parameter privateconnect()method.connect(ConnectRequest): resolves the endpoint address. For unresolved hostnames, callsInetAddress.getAllByName()to expand to all known IPs, building a candidate list. DNS resolution is still deferred to connection time — theEndPointcontinues to hold the unresolved hostname — preserving the dynamic-DNS semantics ofRESOLVE_CONTACT_POINTS=false.tryNextAddress(request, candidates, index): iterates the candidate list. On per-address failure it tries the next candidate; only when all are exhausted does it fail the overallresultFuture.connectToAddress(request, address): performs a single Netty bootstrap connect to one resolved IP. Protocol-version negotiation (downgrade retries) is scoped to the same IP, which is semantically correct. Returns its ownCompletableFuture<DriverChannel>sotryNextAddresscan distinguish a per-address failure from the final outcome.tryNextAddressRaw: unchanged pass-through for non-InetSocketAddresstypes (e.g. Unix domain sockets).MockResolverIT.javaNew integration test
should_connect_when_first_dns_entry_is_non_responsive:127.0.1.xtest.cluster.fakeresolves to127.0.1.11(non-existent) first, then nodes 1 and 2RESOLVE_CONTACT_POINTS=false,RECONNECT_ON_INIT=false