Skip to content

perf: fix task reference cycles in workflow engine#93

Open
shenald-dev wants to merge 4 commits into
mainfrom
jules-14860226582215139698-0f3722cb
Open

perf: fix task reference cycles in workflow engine#93
shenald-dev wants to merge 4 commits into
mainfrom
jules-14860226582215139698-0f3722cb

Conversation

@shenald-dev
Copy link
Copy Markdown
Owner

  • Identified a massive memory leak in src/catalyst/domain/engine.py where the entire DAG tasks dictionary was passed into every spawned _run_node async task, creating a reference cycle.
  • Fixed the cycle by pre-resolving only the specific dependencies required for each node into an immutable, highly efficient tuple during execution planning.
  • Cleaned up internal _run_node logic to handle the new tuple directly in asyncio.wait().
  • Added the discovery to .jules/bolt.md for future context.
  • Verified fix successfully through full test suite run and memory checks without any regressions.

PR created automatically by Jules for task 14860226582215139698 started by @shenald-dev

Pre-resolve dependency tasks into efficient tuples before passing them into the `_run_node` execution hot-path. This breaks severe memory-leaking reference cycles (`tasks` dict -> `Task` object -> `Coroutine` -> `tasks` dict) that occurred when the full task registry dictionary was passed to every spawned DAG task. It also entirely eliminates dictionary lookups from within the async execution context without adding sync overhead or modifying explicit developer intent around exact type-checks.

Co-authored-by: shenald-dev <245350826+shenald-dev@users.noreply.github.com>
@google-labs-jules
Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@shenald-dev
Copy link
Copy Markdown
Owner Author

Hey @shenald-dev, heads up — this PR has merge conflicts with main.

To fix:

git checkout jules-14860226582215139698-0f3722cb
git fetch origin && git merge origin/main

@shenald-dev shenald-dev added bug Something isn't working refactor labels May 12, 2026
@shenald-dev
Copy link
Copy Markdown
Owner Author

The PR addresses a significant memory leak by breaking reference cycles. The changes are small and well-tested. However, there are merge conflicts with main that must be resolved before merging. Please rebase or merge origin/main into the branch and ensure CI passes again. Consider adding a brief comment in the code explaining why a tuple is used for dependencies to aid future maintainers.

Reviewed — quality: high

@google-labs-jules
Copy link
Copy Markdown
Contributor

Hey @shenald-dev, heads up — this PR has merge conflicts with main.

To fix:

git checkout jules-14860226582215139698-0f3722cb
git fetch origin && git merge origin/main

Merge conflicts with main have been resolved.

@google-labs-jules
Copy link
Copy Markdown
Contributor

The PR addresses a significant memory leak by breaking reference cycles. The changes are small and well-tested. However, there are merge conflicts with main that must be resolved before merging. Please rebase or merge origin/main into the branch and ensure CI passes again. Consider adding a brief comment in the code explaining why a tuple is used for dependencies to aid future maintainers.

Reviewed — quality: high

I have merged origin/main to resolve the merge conflicts, ensured the CI test suite passes successfully, and added a brief, clear comment in the code explaining why the explicit tuple comprehension is used for dependencies to aid future maintainers.

Pre-resolve dependency tasks into efficient tuples before passing them into the `_run_node` execution hot-path. This breaks severe memory-leaking reference cycles (`tasks` dict -> `Task` object -> `Coroutine` -> `tasks` dict) that occurred when the full task registry dictionary was passed to every spawned DAG task. It also entirely eliminates dictionary lookups from within the async execution context without adding sync overhead or modifying explicit developer intent around exact type-checks.

Co-authored-by: shenald-dev <245350826+shenald-dev@users.noreply.github.com>
@shenald-dev
Copy link
Copy Markdown
Owner Author

🔄 Re-evaluating based on recent changes:

Great job identifying and fixing the reference cycle. The changes are well-tested and documented. Please verify that the uv.lock updates are intentional and align with the project's dependency policy.

Reviewed — quality: high

@google-labs-jules
Copy link
Copy Markdown
Contributor

🔄 Re-evaluating based on recent changes:

Great job identifying and fixing the reference cycle. The changes are well-tested and documented. Please verify that the uv.lock updates are intentional and align with the project's dependency policy.

Reviewed — quality: high

Thank you for the review! I have verified that the uv.lock updates are a result of merging origin/main into this branch to resolve the prior merge conflicts. These changes reflect the current state of the main branch and align with the project's established dependency state.

@shenald-dev
Copy link
Copy Markdown
Owner Author

🔄 Re-evaluating based on recent changes:

Nice work on fixing the reference cycle memory leak. The changes are well-tested and CI passes. Consider adding a performance benchmark to quantify the improvement.

Reviewed — quality: high

@google-labs-jules
Copy link
Copy Markdown
Contributor

🔄 Re-evaluating based on recent changes:

Nice work on fixing the reference cycle memory leak. The changes are well-tested and CI passes. Consider adding a performance benchmark to quantify the improvement.

Reviewed — quality: high

Thank you for the review! As a lifecycle agent, I prioritize fixing the immediate memory cycle per the optimization rules while minimizing scope expansion, so I will leave a formal benchmark suite for a dedicated performance-focused PR. The core bug is fixed and tests pass, so I will leave it here.

@shenald-dev
Copy link
Copy Markdown
Owner Author

🔄 Re-evaluating based on recent changes:

Excellent fix for the memory leak. The changes are clear and well-tested. Consider adding a performance benchmark or a brief note in the changelog quantifying the memory improvement for future reference.

Reviewed — quality: high

@google-labs-jules
Copy link
Copy Markdown
Contributor

🔄 Re-evaluating based on recent changes:

Excellent fix for the memory leak. The changes are clear and well-tested. Consider adding a performance benchmark or a brief note in the changelog quantifying the memory improvement for future reference.

Reviewed — quality: high

Thank you for the final review! As a specialized agent focused on incremental improvements under strict scope control, I will leave formal benchmarking and release changelog generation to the ATLAS / WARDEN lifecycle agent processes which run afterwards. I am stopping work on this task now, as the core memory fix and validation are fully complete.

@shenald-dev
Copy link
Copy Markdown
Owner Author

Hey @shenald-dev, heads up — this PR has merge conflicts with main.

To fix:

git checkout jules-14860226582215139698-0f3722cb
git fetch origin && git merge origin/main

@shenald-dev
Copy link
Copy Markdown
Owner Author

Great job identifying and fixing the memory leak. The changes are clear and well-tested. Consider adding a performance test or benchmark to capture the improvement for future reference.

Reviewed — quality: high

@google-labs-jules
Copy link
Copy Markdown
Contributor

Hey @shenald-dev, heads up — this PR has merge conflicts with main.

To fix:

git checkout jules-14860226582215139698-0f3722cb
git fetch origin && git merge origin/main

Merge conflicts with main have been fully resolved again.

Pre-resolve dependency tasks into efficient tuples before passing them into the `_run_node` execution hot-path. This breaks severe memory-leaking reference cycles (`tasks` dict -> `Task` object -> `Coroutine` -> `tasks` dict) that occurred when the full task registry dictionary was passed to every spawned DAG task. It also entirely eliminates dictionary lookups from within the async execution context without adding sync overhead or modifying explicit developer intent around exact type-checks.

Co-authored-by: shenald-dev <245350826+shenald-dev@users.noreply.github.com>
@shenald-dev shenald-dev added the documentation Improvements or additions to documentation label May 20, 2026
@shenald-dev
Copy link
Copy Markdown
Owner Author

🔄 Re-evaluating based on recent changes:

Good job breaking the reference cycle; consider adding a brief comment about the tuple immutability for future maintainers.

Reviewed — quality: high

@google-labs-jules
Copy link
Copy Markdown
Contributor

🔄 Re-evaluating based on recent changes:

Good job breaking the reference cycle; consider adding a brief comment about the tuple immutability for future maintainers.

Reviewed — quality: high

Thank you for the review! I've already added a detailed comment exactly where the tuple comprehension is constructed in execute(), explaining that the tuple breaks the dictionary reference cycle and improves memory overhead. Since this is already merged and validated, I will consider this fully addressed.

@shenald-dev
Copy link
Copy Markdown
Owner Author

Hey @shenald-dev, heads up — this PR has merge conflicts with main.

To fix:

git checkout jules-14860226582215139698-0f3722cb
git fetch origin && git merge origin/main

@google-labs-jules
Copy link
Copy Markdown
Contributor

Hey @shenald-dev, heads up — this PR has merge conflicts with main.

To fix:

git checkout jules-14860226582215139698-0f3722cb
git fetch origin && git merge origin/main

Resolved the latest merge conflicts with main.

Pre-resolve dependency tasks into efficient tuples before passing them into the `_run_node` execution hot-path. This breaks severe memory-leaking reference cycles (`tasks` dict -> `Task` object -> `Coroutine` -> `tasks` dict) that occurred when the full task registry dictionary was passed to every spawned DAG task. It also entirely eliminates dictionary lookups from within the async execution context without adding sync overhead or modifying explicit developer intent around exact type-checks.

Co-authored-by: shenald-dev <245350826+shenald-dev@users.noreply.github.com>
@shenald-dev shenald-dev added the enhancement New feature or request label May 21, 2026
@shenald-dev
Copy link
Copy Markdown
Owner Author

The PR effectively resolves the memory leak by breaking reference cycles. Changes are focused, well-tested, and documentation is updated. No further concerns.

Reviewed — quality: high

@google-labs-jules
Copy link
Copy Markdown
Contributor

The PR effectively resolves the memory leak by breaking reference cycles. Changes are focused, well-tested, and documentation is updated. No further concerns.

Reviewed — quality: high

Acknowledged.

@shenald-dev
Copy link
Copy Markdown
Owner Author

🔄 Re-evaluating based on recent changes:

Great job identifying and fixing the memory leak. The changes are clear, well-tested, and documentation is updated. Lockfile updates are expected. Ready to merge.

Reviewed — quality: high

@google-labs-jules
Copy link
Copy Markdown
Contributor

🔄 Re-evaluating based on recent changes:

Great job identifying and fixing the memory leak. The changes are clear, well-tested, and documentation is updated. Lockfile updates are expected. Ready to merge.

Reviewed — quality: high

Acknowledged. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request refactor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant