Skip to content

TransactionTracer concurrency for Maui Transactions#5248

Merged
jamescrosswell merged 8 commits into
maui-trace-ui-5116from
spike/maui-trace-ui-5116-concurrency-redesign
May 21, 2026
Merged

TransactionTracer concurrency for Maui Transactions#5248
jamescrosswell merged 8 commits into
maui-trace-ui-5116from
spike/maui-trace-ui-5116-concurrency-redesign

Conversation

@jamescrosswell
Copy link
Copy Markdown
Collaborator

@jamescrosswell jamescrosswell commented May 20, 2026

Playing around with a concurrency solution for the three concurrency threads that keep getting re-flagged on #5138:

Thread Status on #5138 head Resolved here by
IsFinished / EndTimestamp torn view (sentry-bot, deferred to #5174) Open IsFinished reads a Volatile bool, not the nullable struct
OnIdleTimeout TOCTOU — Warden flagged 3× across 3 commits (b74a6c3, f5d222e, e13985b) Open Single TryBeginFinish critical section; PeekActive re-check inside the lock
_idleTimer.Cancel() treated as authoritative Open Cancellation is explicitly advisory; correctness comes from the in-lock re-check

Solution

Treat TransactionTracer as a one-shot state machine (Running → Finished). The transition, the EndTimestamp write, and the idle-timer disposal happen in one critical section, and IsFinished reads a separate bool flag so observers can never see an inconsistent intermediate view.

What changed

TransactionTracer

  • IsFinished reads bool _hasFinished via Volatile.Read, not the Nullable<DateTimeOffset> EndTimestamp. EndTimestamp is gated on the same flag so lock-free readers see either null or the final stable value, never an in-progress write.
  • TryBeginFinish(fromIdleTimer, out shouldDiscard) is the only path from Running → Finished. Inside one lock acquisition it: checks the one-shot guard, re-checks PeekActive() == null for idle-timer callers (the authoritative defense against a stale firing — Timer.Change(Infinite) cannot stop an in-flight callback), computes the trimmed EndTimestamp, disposes the timer, and flips _hasFinished last via Volatile.Write.
  • Finish() and OnIdleTimeout() are thin wrappers around TryBeginFinish + CompleteCapture. Capture work (profiler stop, scope reset, CaptureTransaction, ReleaseSpans) runs outside the lock but only after the flag flip; concurrent AddChildSpan / ChildSpanFinished see _hasFinished and bail.
  • ChildSpanFinished keeps its fast-path no-op for non-idle transactions — no lock acquisition in the common non-MAUI case.
  • LastActiveSpanTracker: inner lock removed (all callers hold _lock); PeekActive is now non-destructive so SpanTracer.Unfinish() can resurrect a finished span without it falling out of the tracker.

SpanTracer

  • Same Volatile-gated EndTimestamp / IsFinished pattern. The setter sequences writes in opposite orders for null vs non-null so the gate stays consistent through both Finish and Unfinish.

MauiEventsBinder

  • StartUiTransaction and OnWindowOnStopped use the (scope, state) ConfigureScope overload with static lambdas — no closure allocations, no risk of the lambda reading a different CurrentUiTx than the caller intended.

Note

Although every internal access to _spans is now protected by the lock, we still need a ConcurrentBagLite for this since external consumers can fetch a read-only reference to the collection, perform some operations on the transaction that would mutate the underlying collection, and then check/expect those changes to be reflected on the reference that they obtained earlier. This behaviour needs to be preserved.

…e machine

Replaces the implicit "IsFinished derived from EndTimestamp" model with a
single int state field that transitions Running -> Finished atomically,
inside the same critical section that writes EndTimestamp and disposes
the idle timer. Closes the three open races flagged on PR #5138:

1. IsFinished/EndTimestamp torn view (deferred to #5174)
2. OnIdleTimeout TOCTOU between active-span check and Finish() call
3. _idleTimer.Cancel() being treated as authoritative

Key design points:

- IsFinished reads _state via Volatile.Read, not EndTimestamp. Anyone who
  observes IsFinished == true is guaranteed EndTimestamp is set and _spans
  is frozen.

- One primitive: TryBeginFinish(fromIdleTimer, out shouldDiscard). All
  finish paths (Finish, OnIdleTimeout, Dispose) go through it. The
  Running -> Finished flip, the EndTimestamp write, and the idle-timer
  disposal happen inside one lock acquisition.

- Idle-timer cancellation is now explicitly advisory. The authoritative
  defense against a stale firing is the PeekActive() re-check inside the
  lock: if a span arrived after the timer fired, the firing is rejected.

- ChildSpanFinished keeps its fast-path no-op for non-idle transactions,
  so non-MAUI users pay no lock-acquisition cost on span finish.

- _spans LINQ scan to compute trim-to-last-span EndTimestamp now happens
  inside the lock, so it can no longer race with span mutations.

- External EndTimestamp setter routes through the lock for back-compat
  with tests and serialization round-trip.

No tests regress (Sentry.Tests: 2363/2363, Sentry.Maui.Tests: 96/96).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

❌ Patch coverage is 96.72131% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.10%. Comparing base (f5d222e) to head (beaa188).
⚠️ Report is 1 commits behind head on maui-trace-ui-5116.

Files with missing lines Patch % Lines
src/Sentry/TransactionTracer.cs 96.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@                  Coverage Diff                   @@
##           maui-trace-ui-5116    #5248      +/-   ##
======================================================
- Coverage               74.11%   74.10%   -0.02%     
======================================================
  Files                     509      502       -7     
  Lines                   18433    18268     -165     
  Branches                 3624     3580      -44     
======================================================
- Hits                    13662    13537     -125     
+ Misses                   3886     3860      -26     
+ Partials                  885      871      -14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jamescrosswell jamescrosswell changed the title spike: redesign TransactionTracer concurrency around a state machine spike: state machine for TransactionTracer concurrency May 21, 2026
@jamescrosswell jamescrosswell changed the title spike: state machine for TransactionTracer concurrency spike: TransactionTracer concurrency May 21, 2026
jamescrosswell and others added 7 commits May 21, 2026 14:24
Two states don't need an int + named constants. `Volatile.Read`/`Write`
work on bool in C#, so the memory-ordering property is preserved.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All internal callers (AddChildSpan, ChildSpanFinished, TryBeginFinish,
ReleaseSpans) already hold the enclosing TransactionTracer._lock when
they touch the tracker. The public GetLastActiveSpan acquires _lock
explicitly so it gets the same guarantee.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two changes that share a root cause:

- SpanTracer.EndTimestamp: gate the getter on a Volatile bool so lock-free
  readers see a consistent (timestamp, finished) pair. Mirror of the
  pattern already used on TransactionTracer; toggles via Unfinish so the
  setter sequences writes in opposite orders for null vs non-null values.

- LastActiveSpanTracker.PeekActive: stop popping finished spans off the
  stack. The destructive pop combined with SpanTracer.Unfinish() created
  a hole where an unfinished span could no longer be tracked, letting
  the idle timer finish the transaction while a logically-live span was
  still in flight. Now non-destructive; an unfinished span stays in the
  tracker and is re-discoverable after Unfinish().

The 6-step reproduction is documented on the regression test
(IdleTimeout_AfterUnfinishedChildSpan_DoesNotCaptureTransaction) so the
implementation stays clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Commit 4b482a7 added OrgId and StrictTraceContinuation to the Net4_8
verified file, but the underlying properties live in source that exists
on main and not on this branch. The Windows CI is now flagging the
mismatch on every run; removing the two stale lines restores parity
with what PublicApiGenerator actually produces for Sentry.dll on this
branch.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jamescrosswell jamescrosswell changed the title spike: TransactionTracer concurrency TransactionTracer concurrency for Maui Transactions May 21, 2026
@jamescrosswell jamescrosswell marked this pull request as ready for review May 21, 2026 08:00
@jamescrosswell jamescrosswell requested a review from Flash0ver as a code owner May 21, 2026 08:00
@jamescrosswell jamescrosswell merged commit f5d5bb4 into maui-trace-ui-5116 May 21, 2026
30 checks passed
@jamescrosswell jamescrosswell deleted the spike/maui-trace-ui-5116-concurrency-redesign branch May 21, 2026 08:01
Comment on lines +85 to +95
{
// get => _endTimestamp;
get => Volatile.Read(ref _hasFinished) ? _endTimestamp : null;
internal set
{
lock (_lock)
{
_endTimestamp = value;
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Setting TransactionTracer.EndTimestamp directly no longer works as expected because the setter doesn't update the _hasFinished flag, causing the getter to return null.
Severity: MEDIUM

Suggested Fix

Update the EndTimestamp setter in TransactionTracer.cs to also set the _hasFinished flag to true via Volatile.Write(ref _hasFinished, true). This will restore the previous behavior where setting the end timestamp also marks the transaction as finished, ensuring the getter returns the correct value.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: src/Sentry/TransactionTracer.cs#L84-L95

Potential issue: The `EndTimestamp` property setter in `TransactionTracer` was changed
to only update the internal `_endTimestamp` field without setting the `_hasFinished`
flag to `true`. The corresponding getter is conditional on `_hasFinished` and returns
`null` if it's false. Code that relies on setting `EndTimestamp` directly to mark a
transaction as finished, such as in tests, will now find that `EndTimestamp` returns
`null`. This causes issues when creating a `SentryTransaction` from such a tracer, as
the resulting transaction will incorrectly have a `null` timestamp and an `IsFinished`
status of `false`.

Also affects:

  • src/Sentry.Testing/SentryClientTests.cs:1380
  • src/Sentry.Testing/SentryClientTests.cs:1500

Did we get this right? 👍 / 👎 to inform future reviews.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit beaa188. Configure here.

/// </summary>
private void CompleteCapture()
{
TransactionProfiler?.Finish();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_activeSpanTracker.Clear() races with PeekActive() after lock removal

Medium Severity

The inner lock was removed from LastActiveSpanTracker with the rationale that "all callers hold _lock," but ReleaseSpans() calls _activeSpanTracker.Clear() without holding _lock. ReleaseSpans is invoked from CompleteCapture, which runs outside the lock. A concurrent call to GetLastActiveSpan() (which does hold _lock) iterates the underlying Stack<ISpan> via foreach in PeekActive(). If Clear() modifies the stack during that iteration, this can throw InvalidOperationException or produce undefined behavior. The old inner lock on LastActiveSpanTracker protected against exactly this scenario.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit beaa188. Configure here.

public DateTimeOffset? EndTimestamp { get; internal set; }
public DateTimeOffset? EndTimestamp
{
// get => _endTimestamp;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented-out getter left in production code

Low Severity

The line // get => _endTimestamp; is a commented-out alternative implementation left over from development. It serves no purpose and could confuse future readers about the intended behavior of the EndTimestamp getter.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit beaa188. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant