Skip to content

fix(ci): patch run-tests.php getTimer() to return float instead of comma-formatted string#3856

Open
Leiyks wants to merge 10 commits into
masterfrom
leiyks/fix-ci-run-tests-number-format
Open

fix(ci): patch run-tests.php getTimer() to return float instead of comma-formatted string#3856
Leiyks wants to merge 10 commits into
masterfrom
leiyks/fix-ci-run-tests-number-format

Conversation

@Leiyks
Copy link
Copy Markdown
Contributor

@Leiyks Leiyks commented May 5, 2026

What

  • Add a second sed patch to $(BUILD_DIR)/run-tests.php at build time, replacing number_format( with round( inside getTimer()

Why

PHP's run-tests.php getTimer() returns number_format($elapsed, 4), which produces a comma-formatted string (e.g. "1,500.0000") when a test takes ≥1000 seconds. PHP 8.0+ raises E_WARNING: A non-numeric value encountered when that string is used in += arithmetic inside record(). The main test runner process treats this as a fatal worker error and exits with code 2.

This started surfacing ~1 week ago because commit fdbad29 moved DD_EXPERIMENTAL_PROPAGATE_PROCESS_TAGS_ENABLED=0 from ALL_TEST_ENV_OVERRIDE (covers all test targets, including valgrind) into ENV_OVERRIDE (covers only the non-valgrind run). The valgrind run now executes with process tags enabled, making live-debugger and sidecar-related tests 5–20× slower under valgrind's instrumentation, pushing some past the 1000s threshold.

How

We already patch run-tests.php at build time for an unrelated dl() issue. This adds a second patch in the same $(BUILD_DIR)/run-tests.php Makefile target:

sed -i 's/return number_format($$this->rootSuite/return round($$this->rootSuite/' $(BUILD_DIR)/run-tests.php

round($elapsed, 4) returns a float (e.g. 1999.5678) instead of a comma-formatted string, which is safe for += arithmetic. The JUnit XML time='...' attribute also benefits — 1999.5678 is more spec-compliant than 1,999.5678.

Verified the sed pattern matches both PHP 8.4.19 (line 3441) and PHP 8.5.0 (line 3433).

Testing

  • CI: test_extension_ci: [8.4] and test_extension_ci: [8.5] valgrind runs no longer abort with E_WARNING: A non-numeric value encountered in run-tests.php

@datadog-datadog-prod-us1-2
Copy link
Copy Markdown

datadog-datadog-prod-us1-2 Bot commented May 5, 2026

Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 60.72% (+0.01%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: fdf1556 | Docs | Datadog PR Page | Give us feedback!

@Leiyks Leiyks force-pushed the leiyks/fix-ci-run-tests-number-format branch 3 times, most recently from a52010c to fe39d11 Compare May 7, 2026 13:55
@Leiyks Leiyks marked this pull request as ready for review May 7, 2026 15:01
@Leiyks Leiyks requested a review from a team as a code owner May 7, 2026 15:01
@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented May 7, 2026

Benchmarks [ tracer ]

Benchmark execution time: 2026-05-15 13:24:37

Comparing candidate commit fdf1556 in PR branch leiyks/fix-ci-run-tests-number-format with baseline commit 4352865 in branch master.

Found 0 performance improvements and 13 performance regressions! Performance is the same for 180 metrics, 1 unstable metrics.

scenario:ContextPropagationBench/benchInject64Bit

  • 🟥 execution_time [+1.500µs; +2.008µs] or [+11.447%; +15.332%]

scenario:PDOBench/benchPDOOverhead

  • 🟥 execution_time [+8.389µs; +11.001µs] or [+3.243%; +4.253%]

scenario:PDOBench/benchPDOOverheadWithDBM

  • 🟥 execution_time [+8.935µs; +11.614µs] or [+3.442%; +4.474%]

scenario:PHPRedisBench/benchRedisOverhead

  • 🟥 execution_time [+36.921µs; +49.355µs] or [+3.581%; +4.787%]

scenario:SamplingRuleMatchingBench/benchGlobMatching1

  • 🟥 execution_time [+1.587µs; +1.819µs] or [+12.235%; +14.024%]

scenario:SamplingRuleMatchingBench/benchGlobMatching2

  • 🟥 execution_time [+1.636µs; +2.013µs] or [+12.601%; +15.507%]

scenario:SamplingRuleMatchingBench/benchGlobMatching3

  • 🟥 execution_time [+1.440µs; +1.725µs] or [+10.929%; +13.090%]

scenario:SamplingRuleMatchingBench/benchGlobMatching4

  • 🟥 execution_time [+1.609µs; +1.944µs] or [+12.301%; +14.863%]

scenario:SamplingRuleMatchingBench/benchRegexMatching1

  • 🟥 execution_time [+1.570µs; +1.879µs] or [+13.199%; +15.791%]

scenario:SamplingRuleMatchingBench/benchRegexMatching2

  • 🟥 execution_time [+1.551µs; +1.822µs] or [+13.093%; +15.377%]

scenario:SamplingRuleMatchingBench/benchRegexMatching3

  • 🟥 execution_time [+1.560µs; +1.888µs] or [+13.136%; +15.902%]

scenario:SamplingRuleMatchingBench/benchRegexMatching4

  • 🟥 execution_time [+1.561µs; +1.828µs] or [+13.172%; +15.429%]

scenario:SpanBench/benchOpenTelemetryInteroperability

  • 🟥 execution_time [+15.451µs; +18.822µs] or [+5.129%; +6.248%]

@Leiyks Leiyks force-pushed the leiyks/fix-ci-run-tests-number-format branch 4 times, most recently from e63b1f6 to cce2c18 Compare May 13, 2026 12:39
Leiyks added 9 commits May 15, 2026 13:00
…= 7.2)

Lists tests slower than 10s with their durations at the end of the run.
The --show-slow flag only exists in PHP 7.2+, so it's gated by a runtime
PHP_VERSION_ID check to avoid breaking PHP 7.0/7.1 jobs.

Used to verify the hypothesis that recent CI failures are caused by tests
exceeding 1000s under valgrind (triggering the number_format E_WARNING
in run-tests.php's record() method).
…lgrind tests

PHP's run-tests.php getTimer() returns number_format($elapsed, 4) which
produces a comma-formatted string (e.g. "1,500.0000") when a test takes
>=1000 seconds. PHP 8.0+ raises E_WARNING when that string is used in +=
arithmetic inside record(), causing the test runner to abort with exit code 2.

Patch getTimer() at build time so it returns a float (via round()) instead.
The CI .base_test before_script runs wait-for-service-ready.sh, but
WAIT_FOR only listed test-agent — request-replayer was never gated on,
even though it is in the agent_httpbin_service() service block used by
test_extension_ci, ASAN test_c, and ASAN test_c with multiple observers.

Add a request-replayer case to detect_service_type (probing /replay, a
read-only endpoint that returns valid JSON when php -S is up), and add
request-replayer:80 to WAIT_FOR for the three affected job blocks.

PHP Language Tests is unchanged: it only declares test-agent in services.
curl -sf fails on 4xx, which would falsely report not-ready if /replay
returns 4xx when no dump exists yet (the natural startup state). Drop
-f so any HTTP response proves php -S is up and executing index.php;
connection failure remains the only signal of an unhealthy service.
Filters the SLOW TEST SUMMARY to tests that meaningfully approach
the per-test 300s timeout, trimming noise from the long tail.
On PHP 8.4 the test sandbox integration registers early enough that the
worker bundles it into the app-started payload's `integrations` field
(see libdd-telemetry build_app_started), so no separate
app-integrations-change event is emitted and the existing assertion
loop times out.

Accept both delivery paths: the dedicated app-integrations-change event,
and the integrations array embedded in app-started. The expected output
shape is identical because Rust Integration struct field order matches.
…ix flaky PHP 8.0 valgrind

The test guarded a sleep(20) with `if (ini_get(...) != 0.5)`, but the
wrapped sleep() blocks SIGVTALRM during the call. Under valgrind on
PHP 8.0, the heavier signal traffic can cut the sleep short before the
sidecar fetches the config from the request-replayer and propagates it
to INI globals, leaving every dynamic value at its default.

Replace the single sleep() with a 100ms polling loop bounded at 20s
total. Each usleep() iteration goes through the same SIGVTALRM-blocking
wrapper, which calls ddtrace_check_for_new_config_now() on unblock under
valgrind, giving the test repeated, well-defined opportunities to apply
the config while still finishing early on the happy path.
The previous warmup was a no-op:

  $start = microtime(true);
  ...
  usleep(floor(microtime(true) - $start) * 100000);

floor() of a sub-second float yields 0, so usleep(0) did nothing. Also,
the sleep happened *after* the span had already been sampled with an
empty env, so even a real delay wouldn't have helped.

Replace it with dd_trace_internal_fn('await_agent_info') called before
opening the span — same pattern that client_side_stats*.phpt already
uses to gate on the sidecar having read /info from the agent.
The request-replayer runs php -S, which is single-threaded by default.
With DD_REMOTE_CONFIG_POLL_INTERVAL_SECONDS=0.01 in many live-debugger
tests and parallel test workers under valgrind, the sidecar's RC polling
saturates the replayer's accept loop. Inbound connections from the test
process (e.g. file_get_contents('/replay')) hit the kernel listen backlog
and time out with 'Operation now in progress' (EINPROGRESS), poisoning
the EXPECTF output of any test that retries.

PHP 7.4+ recognizes PHP_CLI_SERVER_WORKERS to fork N worker processes,
restoring real concurrency for the built-in server.
@Leiyks Leiyks force-pushed the leiyks/fix-ci-run-tests-number-format branch from cb473bd to 67144f9 Compare May 15, 2026 11:00
The previous PHP_CLI_SERVER_WORKERS=4 change broke
InstrumentationTest::testInstrumentation across many test_web_custom
jobs: the request-replayer image writes shared state to a single
dump.json file with no inter-process coordination, so multiple php -S
workers race and overwrite telemetry events (e.g. drop the
'logs_created' metric the test asserts on).

Revert the workers var and instead reduce the per-test RC poll rate
10x (0.01s -> 0.1s) in the 12 affected tests. This cuts the sidecar's
load on the single-process replayer enough to relieve the listen
backlog under valgrind without breaking shared-state assumptions.
@Leiyks Leiyks force-pushed the leiyks/fix-ci-run-tests-number-format branch from 0e715c5 to fdf1556 Compare May 15, 2026 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant