Clean up orphaned multiprocessing workers by dll-create · Pull Request #192 · AgentFlocks/flocks

dll-create · 2026-04-25T11:55:42Z

Summary

Detect orphaned Python multiprocessing workers that belong to the current Flocks install.
Include those orphaned workers in flocks stop target discovery and force-kill refreshes.
Add regression coverage for matching only Flocks-owned orphan workers and stopping when no pid file or port listener remains.

Evidence

On macOS, multiple stale workers were observed with PPID 1 and sustained CPU usage after the Flocks service was no longer reachable through the tracked backend pid tree:

/opt/homebrew/.../Python -c from multiprocessing.spawn import spawn_main; spawn_main(...) --multiprocessing-fork

lsof -p <pid> showed those processes had cwd/open files under a Flocks checkout and its virtualenv. Because they were reparented to PID 1 and did not own the backend port, the existing stop logic could miss them: it only gathered the tracked pid tree, runtime record pids, process group, and port listeners.

This change keeps the cleanup narrow: a candidate must be a Unix PPID=1 Python multiprocessing spawn process, and lsof must show open files under the current Flocks repo, .venv, or state directory.

Tests

python -m pytest tests/cli/test_service_manager.py
66 passed

xiami762

Thank you for sharing the PR. Here is the review.

Findings

High: This change wires backend orphan-worker cleanup into the generic stop_one() helper, so the WebUI stop path can also kill backend-owned Python workers. That is too broad. stop_one() is a shared helper, and stop_all() stops WebUI first; the upgrade flow also calls stop_one(..., "WebUI", ...) directly. With orphaned_flocks_worker_pids() added unconditionally to both the initial target list and the refreshed stop targets, “stop WebUI” can now clean up orphaned backend workers and attribute that work to the WebUI shutdown path. At minimum, this logic should be limited to the backend-specific path, not the generic stop helper.

def stop_one(port: int, pid_file: Path, name: str, console) -> None:
    """Stop a single service by tracked pid and/or listening port."""
    cleanup_stale_pid_file(pid_file)
    runtime_record = read_runtime_record(pid_file)
    tracked_pid = runtime_record.pid if runtime_record else None
    listeners = port_owner_pids(port)

    target_pids: list[int] = []
    if tracked_pid is not None:
        target_pids = append_unique_pids(target_pids, collect_process_tree_pids(tracked_pid))
    target_pids = append_unique_pids(target_pids, listeners)

"""Stop frontend then backend while reusing the caller's lock."""
fe_port, be_port = _resolve_stop_ports(paths, config)
_resolve_upgrade_runtime(console, frontend_port=fe_port, attempt_recover=False)
stop_one(fe_port, paths.frontend_pid, "WebUI", console)
stop_one(be_port, paths.backend_pid, "后端", console)

console = _NullConsole()
paths = service_manager.ensure_runtime_dirs()
frontend_port = service_manager._recorded_port(paths.frontend_pid, config.frontend_port)
service_manager.stop_one(frontend_port, paths.frontend_pid, "WebUI", console)

Medium: The new ownership heuristic is too broad and can kill unrelated Python multiprocessing jobs started from the same checkout or virtualenv. Based on the PR, a process is treated as Flocks-owned if it has PPID=1, its command contains multiprocessing.spawn, and lsof output mentions the current repo, current .venv, or the Flocks state directory. That does not prove the process came from the Flocks backend. Any local script, test run, or other daemon started from the same repo/venv could match and be killed by flocks stop. The ownership check needs a stronger Flocks-specific signal, such as a backend-specific command signature, runtime artifact association, or another marker that ties the worker to the active Flocks service rather than to the checkout in general.

dll-create force-pushed the fix-orphaned-multiprocessing-workers branch from 4a26e3e to 1309c0e Compare April 25, 2026 12:00

Clean up orphaned multiprocessing workers

e1b4247

dll-create force-pushed the fix-orphaned-multiprocessing-workers branch from 1309c0e to e1b4247 Compare April 25, 2026 12:01

xiami762 self-requested a review April 28, 2026 10:17

xiami762 reviewed Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up orphaned multiprocessing workers#192

Clean up orphaned multiprocessing workers#192
dll-create wants to merge 1 commit intoAgentFlocks:mainfrom
dll-create:fix-orphaned-multiprocessing-workers

dll-create commented Apr 25, 2026 •

edited

Loading

Uh oh!

xiami762 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dll-create commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Evidence

Tests

Uh oh!

xiami762 left a comment

Choose a reason for hiding this comment

Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dll-create commented Apr 25, 2026 •

edited

Loading