Add createJob(unique: true) for fan-out dedup#498
Merged
Conversation
When a fan-out dispatcher enqueues per-tenant work on every cron tick,
plain `createJob()` happily inserts a duplicate row even if the previous
tick's job for the same tenant is still pending or stuck. The result is
a slow accumulation of identical pending jobs -- in one observed case,
5876 stuck `VolunteerCheckOutReminder` rows from a single tenant whose
DB connection had been decommissioned.
`isQueued()` already exists for this, but every caller has to wrap each
`createJob()` in a two-line manual guard. This adds the guard to
`createJob()` itself as an opt-in `unique` flag:
$queuedJobsTable->createJob(
'VolunteerCheckOutReminder',
['account_uuid' => $accountUuid],
[
'reference' => 'volunteer_check_out:' . $accountUuid,
'unique' => true,
],
);
When `unique` is set and a pending (`completed IS NULL`) job exists for
the same `(reference, resolved job_task)` pair, the existing entity is
returned and no new row is inserted. The dedup hit is logged at info
level. `unique` without a `reference` throws `InvalidArgumentException`
at the call site -- failing fast beats silently inserting an undeduped
row.
The flag lives on `JobConfig` as a request-time property outside
`_keyMap`, so it never leaks into `toArray()` output or the
`queued_jobs` row. Default is `false`, fully BC.
Race window: two ticks landing simultaneously can both pass the
`isQueued()` check and both insert. A DB-level unique constraint would
close that, but it requires a migration and a decision on how callers
should opt in per-table -- out of scope for this PR. The 99%
effectiveness already kills the slow-buildup scenario this is built
for.
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #498 +/- ##
============================================
+ Coverage 78.11% 78.34% +0.23%
- Complexity 975 981 +6
============================================
Files 45 45
Lines 3303 3330 +27
============================================
+ Hits 2580 2609 +29
+ Misses 723 721 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
createJob()has no built-in dedup. When a cron-driven dispatcher enqueues per-tenant work on every tick, it cheerfully inserts a new row each time — even if last tick's job for the same tenant is still pending or stuck. The result is a slow accumulation of identical pending jobs.In one observed case (multi-tenant CakePHP app), a single decommissioned tenant's per-account task threw on every cron tick, leaving 5876 stuck pending rows in
queued_jobs. The admin dashboard couldn't even render the queue list anymore — the OOM that's just been capped in #497.isQueued()already exists for exactly this case. But every caller currently has to wrap eachcreateJob()call in two lines of manual guard. This PR folds that guard intocreateJob()itself as an opt-in flag.Usage
Design choices
QueuedJobreturn type. Returning the live entity keeps the signature stable and lets callers log/use the existing job id. Same mental model asfindOrCreate.unique: truewithoutreferencethrowsInvalidArgumentException. Programming error — fail fast at the call site rather than silently insert.JobConfigoutside_keyMap. It's a request-time concern, not persisted state — won't leak intotoArray()output or thequeued_jobsrow.false, fully BC. Existing callers see no behavior change.Race window
Two ticks landing in the same millisecond can both pass the
isQueued()check and both insert. A DB-level unique constraint would close that race, but it requires a migration and a decision on how callers opt in per-table — out of scope here. The 99% effectiveness already kills the slow-buildup scenario this is built for, and the existingisQueued()doc page has the same caveat.Changes
Queue\Config\JobConfig::setUnique() / isUnique()— new request-time flag.JobConfig::fromArray()— accepts'unique'key, plucks it out before the_keyMaploop so it doesn't trip the strict field lookup.QueuedJobsTable::createJob()— whenuniqueis set, runs the dedup query and returns the existing entity if found.docs/guide/queueing-jobs.md— new section under "Avoiding parallel (re)queueing".How verified locally
vendor/bin/phpunit— 193 tests, all green (pre-existing deprecations fromExecuteTaskTestonly).vendor/bin/phpstan analyze— no errors.vendor/bin/phpcs --parallel=16— no errors.