Skip to content

fix: bail from EnsureComponentByHash.Handle when newObj signals requeue#87

Merged
vroldanbet merged 1 commit into
mainfrom
ensure-component-nil-fix
May 8, 2026
Merged

fix: bail from EnsureComponentByHash.Handle when newObj signals requeue#87
vroldanbet merged 1 commit into
mainfrom
ensure-component-nil-fix

Conversation

@vroldanbet
Copy link
Copy Markdown
Contributor

@vroldanbet vroldanbet commented May 8, 2026

What

EnsureComponentByHash.Handle calls the user-supplied newObj(ctx) factory and then immediately dereferences the result via e.Hash(newObj) and newObj.WithAnnotations(...). If the factory both signals a requeue (e.g. RequeueAPIErr) and returns a zero-value apply configuration, the framework segfaults on the deref.

This change adds a ctx.Err() != nil short-circuit immediately after newObj(ctx) and returns early. The check uses the canonical context-cancellation pattern already documented in queue/handlers.go and used by state.Continue, queue.OnError, and parallel execution.

newObj := e.newObj(ctx)
// newObj may have signaled a requeue (e.g. RequeueAPIErr) which cancels
// the handler context; bail out before dereferencing the result.
if ctx.Err() != nil {
    return
}
hash := e.Hash(newObj)
newObj = newObj.WithAnnotations(map[string]string{e.HashAnnotationKey: hash})

A regression test (TestEnsureComponentByHashStopsWhenNewObjSignalsRequeue) plumbs a real queue.Operations whose cancel func cancels the handler context — mirroring the manager's plumbing — and asserts the handler does not panic, does not call applyObject/deleteObject, records the requeue error, and exits cleanly.

Why

We hit this in production downstream. Stack trace from the operator:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x1612a4b]

k8s.io/client-go/applyconfigurations/core/v1.(*SecretApplyConfiguration).WithAnnotations(0x0, ...)
    .../client-go/applyconfigurations/core/v1/secret.go:225 +0x2b
github.com/authzed/controller-idioms/component.(*EnsureComponentByHash[...]).Handle(...)
    .../controller-idioms/component/ensure_component.go:59 +0x1b9

The receiver in WithAnnotations(0x0, ...) confirmed the apply config was nil. The downstream factory had a path that called RequeueAPIErr (when a status patch failed) and returned nil. RequeueAPIErr cancels the handler context via the Operations.cancel func but does not unwind the goroutine, so the nil reached Handle and crashed the controller.

Fixing the contract violation in the caller is necessary regardless, but the framework should also not segfault on a misbehaving factory. The ctx.Err() check makes the framework honor the same "stop when the queue says stop" contract that state.Continue and OnError already use.

Test plan

  • Regression test fails before the fix with the exact production stack ((*ServiceApplyConfiguration).WithAnnotations(0x0, ...) at ensure_component.go:59) and passes after
  • go test -race ./... passes across all packages
  • golangci-lint run ./component/... reports 0 issues

A newObj factory may both signal a requeue (RequeueAPIErr/RequeueErr)
and return a zero-value apply config, e.g. when a status patch fails
and the caller wants the next reconciliation to retry. RequeueAPIErr
cancels the handler context but does not unwind the goroutine, so the
framework would dereference the zero value to attach a hash annotation
and segfault.

Check ctx.Err() after newObj(ctx) and return early. This matches the
documented context-cancellation pattern used by state.Continue and
queue.OnError.

Adds a regression test that plumbs a real queue.Operations whose cancel
func cancels the handler context, mimicking the manager's plumbing.
Copy link
Copy Markdown
Contributor

@ecordell ecordell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

this is a fine addition, though generally the idea is to just not write newObj in a way that can fail. it should generally be a stateless fn that builds the object you want to apply

@vroldanbet vroldanbet marked this pull request as ready for review May 8, 2026 18:04
@vroldanbet vroldanbet merged commit 1e52ede into main May 8, 2026
9 checks passed
@vroldanbet vroldanbet deleted the ensure-component-nil-fix branch May 8, 2026 18:30
@github-actions github-actions Bot locked and limited conversation to collaborators May 8, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants