[DX-4017] [DX-3716] Core Test Timeout Fix by kalverra · Pull Request #22281 · smartcontractkit/chainlink

kalverra · 2026-05-01T20:32:18Z

`chainlink-test-diagnosis` Skill

Introduces an AI agent skill chainlink-test-diagnosis that wraps the diagnose command to narrow down flakes, slow tests, and hangs. I used this skill to fix the many flakes in the github.com/smartcontractkit/chainlink/v2/core/services/vrf/v2 package and included these in the PR. This includes a timeout that plagues CI Core and makes a normally 5-6 minute test run timeout after 15m+.

To verify the flakes were fixed, I re-ran all tests in the package 200 times and encountered no issues! The first runs were showing flake rates of 10%+ for the package.

Other Improvements

Improved the performance and UX of the tools/test harness, especially the diagnose command.
Some minor linting and modernization fixes.
Moving tests from gomega to testify where possible.

Extra Review Attention

Please pay attention to this change as it stops some race-conditions in test panics, but I fear it could be masking errors?

…chainlink into coreTestTimeoutFix

github-actions · 2026-05-01T20:33:33Z

I see you updated files related to core. Please run make gocs in the root directory to add a changeset as well as in the text include at least one of the following tags:

#added For any new functionality added.
#breaking_change For any functionality that requires manual action for the node to boot.
#bugfix For bug fixes.
#changed For any change to the existing functionality.
#db_update For any feature that introduces updates to database schema.
#deprecation_notice For any upcoming deprecation functionality.
#internal For changesets that need to be excluded from the final changelog.
#nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
#removed For any functionality/config that is removed.
#updated For any functionality that is updated.
#wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

github-actions · 2026-05-01T20:34:14Z

✅ No conflicts with other open PRs targeting develop

trunk-io · 2026-05-01T20:50:10Z

_{View Full Report ↗︎ ⋅ Docs}

Copilot

Pull request overview

Risk Rating: MEDIUM

This PR aims to reduce intermittent CI timeouts/flakiness by improving the tools/test diagnose harness (parallel iterations, smarter fail-fast, better output/logging metadata) and by hardening several flaky/timeout-prone Core tests (notably VRF v2 integration tests and LLO telemetry sampling).

Changes:

Extend tools/test diagnose with parallel iteration execution, richer progress/output plumbing, fail-fast categories, and improved report/log metadata.
Add persistent test DB lifecycle commands (setup-testdb, remove-testdb) and database pooling to support parallel diagnose safely.
Update multiple Core tests to avoid LogPoller indexing races and to use more robust timeouts/synchronization patterns.

Areas requiring scrupulous human review

tools/test/internal/runner/runner.go: parallel iteration scheduling/cancellation and progress rendering interplay.
tools/test/internal/db/db.go: pool creation semantics (env propagation, global CL_DATABASE_URL behavior, cleanup on partial failure).
VRF v2 integration test changes around filtering/indexing boundaries (commitRequestAndFilterIndexBlock, indexedFilterOpts) to ensure they’re correct under simulated backend semantics and don’t mask real regressions.
tools/test/main.go --sync-skills: filesystem safety (symlink handling) given RemoveAll + recursive copy.

Reviewed changes

Copilot reviewed 49 out of 50 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
.gitignore	Ignore root `.claude/*` while allowing tools/test skills directory pattern.
README.md	Document new `tools/test` flow and link to the AI skill.
core/internal/testutils/testutils.go	Add `WaitTimeoutCustom` and change `WaitTimeout` to cap against requested duration.
core/internal/testutils/testutils_test.go	Add tests for new timeout helpers.
core/logger/internal/colortest/prettyconsole_test.go	Skip color assertions when ANSI is disabled.
core/services/llo/telem/sampling.go	Use `wg.Go` pattern for pruning loop.
core/services/llo/telem/sampling_test.go	Stabilize time-based behavior; use `synctest` for deterministic goroutine exit.
core/services/llo/telem/telemetry.go	Switch goroutine spawn to `wg.Go` for channel handlers.
core/services/llo/telem/telemetry_test.go	Update pointer construction and loops in telemetry tests.
core/services/vrf/v2/bhs_feeder_test.go	Make heartbeat assertions robust under parallel load by polling tip-256 dynamically.
core/services/vrf/v2/integration_helpers_test.go	Replace Gomega polling with `require.Eventually*`; improve receipt/log parsing patterns.
core/services/vrf/v2/integration_v2_plus_test.go	Parse `SubscriptionCreated` via tx receipt instead of log filters.
core/services/vrf/v2/integration_v2_reverted_txns_test.go	Increase timeout for force-fulfillment; use bounded filter opts.
core/services/vrf/v2/integration_v2_test.go	Reduce LogPoller race by retrying filters with bounded opts; add helper filter opts builders.
core/services/vrf/v2/listener_v2_log_processor.go	Switch goroutine management to `wg.Go` for request processing.
core/services/vrf/v2/reverted_txns.go	Avoid `Fatalw` on context cancellation for reverted-tx polling.
tools/test/AGENTS.md	Document harness output policy and skill location.
tools/test/.agents/skills/chainlink-test-diagnosis/SKILL.md	Add/define the new diagnose AI skill content and CLI reference.
tools/test/README.md	Document `--fail-fast-on` examples and update skill link.
tools/test/internal/cmd/diagnose.go	Add `--parallel-iterations`, `--fail-fast-on`, config validation, and DB pool wiring.
tools/test/internal/cmd/gotestsum.go	Route cleanup errors through output printer.
tools/test/internal/cmd/root.go	Skip DB pre-run for diagnose/setup/remove-testdb; wire output into DB ensure.
tools/test/internal/cmd/root_test.go	Update command path expectations; add validation tests for diagnose config.
tools/test/internal/cmd/run.go	Use output printer for cleanup error reporting.
tools/test/internal/cmd/testdb.go	Add `setup-testdb` / `remove-testdb` commands for persistent Postgres.
tools/test/internal/config/config.go	Add parallel iterations + fail-fast-on config and normalization.
tools/test/internal/config/config_test.go	Ensure new flags bind and normalize as expected.
tools/test/internal/db/db.go	Add DB pool + resource model; integrate output printer.
tools/test/internal/db/db_test.go	Update Ensure signature usage; add tests for pool behaviors and env.
tools/test/internal/db/persistent.go	Start/remove persistent Postgres containers; shell export helpers.
tools/test/internal/db/persistent_test.go	Test label filter and shell export quoting helpers.
tools/test/internal/output/output.go	Introduce `output.Printer` to centralize human vs AI output and TTY progress rules.
tools/test/internal/output/output_test.go	Validate printer behavior (AI/human gating, live inline detection).
tools/test/internal/runner/analyze.go	Add run metadata, per-problem log references, iteration digest, and log filename truncation logic.
tools/test/internal/runner/analyze_bench_test.go	Add benchmarks using real testdata JSONL payloads.
tools/test/internal/runner/analyze_files_test.go	Update log-file expectations for new `ProblemLog` model and “only problem iterations” behavior.
tools/test/internal/runner/analyze_test.go	Add digest tests; adjust expectations to ignore internal-only fields.
tools/test/internal/runner/diagnose_progress.go	Add parallel progress tracking and updated inline progress formatting.
tools/test/internal/runner/diagnose_progress_test.go	Update expectations for new inline format; test parallel progress rendering.
tools/test/internal/runner/diagnose_results_dir.go	Simplify results dir naming to slug+timestamp (full args moved into report metadata).
tools/test/internal/runner/diagnose_table.go	Add streaming iteration summary table formatting.
tools/test/internal/runner/diagnose_table_test.go	Add tests for table header/row formatting.
tools/test/internal/runner/runner.go	Major refactor: parallel iterations, live progress, fail-fast-on categories, run meta injection, result dir creation.
tools/test/internal/runner/runner_test.go	Update tests for new diagnose signature and new behaviors (parallelism, fail-fast-on, results dir uniqueness).
tools/test/internal/runner/testdata/iteration-0.log.jsonl	Add benchmark/analyze fixture log.
tools/test/internal/runner/testdata/iteration-2.log.jsonl	Add benchmark/analyze fixture log.
tools/test/main.go	Add `go:generate` sync to copy skills into `.claude/skills`; implement copy logic.
tools/test/main_test.go	Add tests for skill sync safety and mode preservation.

jmank88 · 2026-05-05T19:44:21Z

+
+// WaitTimeoutCustom for longer timeouts, up to the test's Deadline, or the requested timeout, whichever is shorter.
+// Use when DefaultWaitTimeout is too short for your specific test.
+func WaitTimeoutCustom(t *testing.T, requested time.Duration) time.Duration {


Doesn't this change the logic of WaitTimeout? It used to alway prefer 90% no matter how that compared to the default. Now it would return the default instead if that were shorter, but that will artificially limit tests that we want to use 90%.

Ya. My thought was "let's force people to use cleaner timeouts" but I'm realizing that's pretty heavy-handed and unrealistic in this case.

I think it would re-introduce a lot of chaos. Switching from static, artificially short timeouts over to these dynamic extended ones was a big win originally. I think we've only increased concurrency and non-determinism since then.

I see that logic is still in this version. Why is it better to quit early, when the suite is configured to run longer? Scaling to that limit is a feature - e.g. What if I have an older machine? Am I expected to edit the code to get tests to pass?

Copilot

Pull request overview

Copilot reviewed 49 out of 51 changed files in this pull request and generated 7 comments.

Comments suppressed due to low confidence (1)

core/services/llo/telem/telemetry_test.go:363

new("test error") is not valid Go (the built-in new takes a type, not a value). Use a pointer helper (e.g., ptr("test error")) or take the address of a local string variable instead.

		ch <- &LLOObservationTelemetry{
			StreamId:              135,
			StreamValueType:       1,
			StreamValueBinary:     []byte{0x01, 0x02, 0x03},
			StreamValueText:       "stream value text",
			ObservationError:      new("test error"),
			ObservationTimestamp:  time.Unix(1, 1).UnixNano(),
			ObservationFinishedAt: time.Unix(2, 1).UnixNano(),
			SeqNr:                 42,
			ConfigDigest:          []byte{0x01, 0x02, 0x03},
		}

Copilot

Pull request overview

Copilot reviewed 49 out of 51 changed files in this pull request and generated 2 comments.

Copilot

Pull request overview

Copilot reviewed 49 out of 51 changed files in this pull request and generated 2 comments.

cl-sonarqube-production · 2026-05-05T21:34:36Z

Quality Gate passed

Issues
1 New issue
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube

kalverra added 14 commits May 1, 2026 13:46

Add skill and improve test harness

a44d16a

Simplify

5269871

claude skills

94ff75a

comments

6ad0195

lint

91e3898

README

704bbca

Fix file paths

e7261b2

Fix paths

6816278

Protect timeout

e91fb01

Fix color tests for AI terminals

9976c73

More flexible --fail-fast

23c139c

Update skill

d16e814

Comments

e4174f9

Merge branch 'chainlinkTestDiagnosis' of github.com:smartcontractkit/…

763dc05

…chainlink into coreTestTimeoutFix

kalverra mentioned this pull request May 1, 2026

[DX-4017] Skill chainlink-test-diagnosis + Improved Harness #22280

Open

kalverra added 12 commits May 1, 2026 21:42

Fix TestPruningLoop_Exits

dcbe745

lint

1e26ae5

more focused fix

18f12bc

lint

d74cc4d

skill

d7bc17e

test fix

3217a2d

restore deadline timeout

4fe2fbf

custom timeout

60b90c2

timeout test

5834061

Fix heartbeat

b46a73c

Fix vrf/v2 package flakes

9c3596f

symlink

191f4df

product-security-plaid-production Bot requested review from denis-chernov-smartcontract and se3000 May 5, 2026 19:11

kalverra requested review from Tofel, jmank88 and matYang and removed request for Atrax1 and denis-chernov-smartcontract May 5, 2026 19:11

Copilot started reviewing on behalf of kalverra May 5, 2026 19:11 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

kalverra commented May 5, 2026

View reviewed changes

Comment thread core/services/vrf/v2/reverted_txns.go Outdated

kalverra added 3 commits May 5, 2026 15:31

Comments

eac3f11

Prevent masking errors

4253fcf

std errors

4fc6ab4

kalverra requested a review from Copilot May 5, 2026 19:40

Copilot started reviewing on behalf of kalverra May 5, 2026 19:41 View session

jmank88 reviewed May 5, 2026

View reviewed changes

Copilot AI reviewed May 5, 2026

View reviewed changes

jmank88 reviewed May 5, 2026

View reviewed changes

Comment thread tools/test/main.go Outdated

kalverra changed the title ~~[DX-4017] Core Test Timeout Fix~~ [DX-4017] [DX-3716] Core Test Timeout Fix May 5, 2026

Comments

46af379

kalverra requested review from Copilot and jmank88 May 5, 2026 20:44

Copilot started reviewing on behalf of kalverra May 5, 2026 20:45 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

Comment thread core/services/vrf/v2/integration_v2_plus_test.go

Comment thread core/services/vrf/v2/integration_helpers_test.go

Fix contract parsing

c6a621e

kalverra requested a review from Copilot May 5, 2026 21:08

Copilot started reviewing on behalf of kalverra May 5, 2026 21:09 View session

Revert contract checks

712758f

Copilot AI reviewed May 5, 2026

View reviewed changes

Comment thread core/services/vrf/v2/integration_helpers_test.go

Comment thread core/services/vrf/v2/integration_helpers_test.go

Conversation

kalverra commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

chainlink-test-diagnosis Skill

Other Improvements

Extra Review Attention

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

github-actions Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

trunk-io Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Areas requiring scrupulous human review

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jmank88 May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kalverra May 5, 2026

Choose a reason for hiding this comment

Uh oh!

jmank88 May 5, 2026

Choose a reason for hiding this comment

Uh oh!

jmank88 May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

cl-sonarqube-production Bot commented May 5, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kalverra commented May 1, 2026 •

edited

Loading

`chainlink-test-diagnosis` Skill

github-actions Bot commented May 1, 2026 •

edited

Loading

trunk-io Bot commented May 1, 2026 •

edited

Loading

jmank88 May 5, 2026 •

edited

Loading