Skip to content

Retry sidecar connect before probe install#3883

Draft
Leiyks wants to merge 1 commit into
masterfrom
dd/fix-sidecar-rinit-retry-di-probes
Draft

Retry sidecar connect before probe install#3883
Leiyks wants to merge 1 commit into
masterfrom
dd/fix-sidecar-rinit-retry-di-probes

Conversation

@Leiyks
Copy link
Copy Markdown
Contributor

@Leiyks Leiyks commented May 15, 2026

Description

Workflow Automation • View in Workflow Automation

Motivation (WHY)

  • test_extension_ci has recurring flakes in live debugger / dynamic instrumentation tests where probes are not installed in time.
  • The failures align with a startup race window where the sidecar transport can still be unavailable when request initialization proceeds, which prevents timely RC polling / probe application.
  • This has been a major source of master pipeline instability (for example pipeline 113108177).

Changes (WHAT)

  • Added a sidecar readiness retry in ext/sidecar.c during ddtrace_sidecar_rinit():
    • If DDTRACE_G(sidecar) is not yet connected but ddtrace_endpoint is available, call ddtrace_sidecar_ensure_active() before submitting root span data.
  • This keeps the existing startup path intact and only adds a targeted second-chance connection at request init, reducing probe-install race likelihood without broad timeout increases.

Testing (HOW verified)

  • Environment limitations prevented full local CI reproduction:
    • podman compose networking is unavailable in this sandbox (missing CNI plugins), so request-replayer/test-agent services could not be started.
    • libdatadog submodule fetch is blocked (403 allowlist restriction), so full extension build/test execution is not possible here.
  • Validation performed:
    • Confirmed affected probe-install/test helper paths and sidecar/RC initialization flow in source.
    • Verified the patch is minimal and scoped to sidecar reconnect timing in ddtrace_sidecar_rinit().
    • Confirmed final diff only changes ext/sidecar.c.

Relevant failing pipeline example:
https://gitlab.ddbuild.io/DataDog/apm-reliability/dd-trace-php/-/pipelines/113108177

Reviewer checklist

  • Test coverage seems ok.
  • Appropriate labels assigned.

PR by Bits - View session in Datadog

Comment @DataDog to request changes

@datadog-prod-us1-4
Copy link
Copy Markdown

View session in Datadog

Bits Dev status: ✅ Done

CI Auto-fix: Disabled | Enable

Comment @DataDog to request changes

@datadog-prod-us1-6
Copy link
Copy Markdown

I can only run on private repositories.

@datadog-prod-us1-4
Copy link
Copy Markdown

datadog-prod-us1-4 Bot commented May 15, 2026

Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 60.72% (+0.04%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 4cd2cdc | Docs | Datadog PR Page | Give us feedback!

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented May 15, 2026

Benchmarks [ tracer ]

Benchmark execution time: 2026-05-15 14:33:44

Comparing candidate commit 4cd2cdc in PR branch dd/fix-sidecar-rinit-retry-di-probes with baseline commit 4352865 in branch master.

Found 1 performance improvements and 0 performance regressions! Performance is the same for 192 metrics, 1 unstable metrics.

scenario:SamplingRuleMatchingBench/benchGlobMatching3-opcache

  • 🟩 execution_time [-631.615ns; -285.585ns] or [-4.891%; -2.211%]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants