Clean up orphaned multiprocessing workers#192
Open
dll-create wants to merge 1 commit intoAgentFlocks:mainfrom
Open
Clean up orphaned multiprocessing workers#192dll-create wants to merge 1 commit intoAgentFlocks:mainfrom
dll-create wants to merge 1 commit intoAgentFlocks:mainfrom
Conversation
4a26e3e to
1309c0e
Compare
1309c0e to
e1b4247
Compare
xiami762
reviewed
Apr 28, 2026
Contributor
xiami762
left a comment
There was a problem hiding this comment.
Thank you for sharing the PR. Here is the review.
Findings
High: This change wires backend orphan-worker cleanup into the genericstop_one()helper, so theWebUIstop path can also kill backend-owned Python workers. That is too broad.stop_one()is a shared helper, andstop_all()stopsWebUIfirst; the upgrade flow also callsstop_one(..., "WebUI", ...)directly. Withorphaned_flocks_worker_pids()added unconditionally to both the initial target list and the refreshed stop targets, “stop WebUI” can now clean up orphaned backend workers and attribute that work to theWebUIshutdown path. At minimum, this logic should be limited to the backend-specific path, not the generic stop helper.
def stop_one(port: int, pid_file: Path, name: str, console) -> None:
"""Stop a single service by tracked pid and/or listening port."""
cleanup_stale_pid_file(pid_file)
runtime_record = read_runtime_record(pid_file)
tracked_pid = runtime_record.pid if runtime_record else None
listeners = port_owner_pids(port)
target_pids: list[int] = []
if tracked_pid is not None:
target_pids = append_unique_pids(target_pids, collect_process_tree_pids(tracked_pid))
target_pids = append_unique_pids(target_pids, listeners)"""Stop frontend then backend while reusing the caller's lock."""
fe_port, be_port = _resolve_stop_ports(paths, config)
_resolve_upgrade_runtime(console, frontend_port=fe_port, attempt_recover=False)
stop_one(fe_port, paths.frontend_pid, "WebUI", console)
stop_one(be_port, paths.backend_pid, "后端", console)console = _NullConsole()
paths = service_manager.ensure_runtime_dirs()
frontend_port = service_manager._recorded_port(paths.frontend_pid, config.frontend_port)
service_manager.stop_one(frontend_port, paths.frontend_pid, "WebUI", console)Medium: The new ownership heuristic is too broad and can kill unrelated Python multiprocessing jobs started from the same checkout or virtualenv. Based on the PR, a process is treated as Flocks-owned if it hasPPID=1, its command containsmultiprocessing.spawn, andlsofoutput mentions the current repo, current.venv, or the Flocks state directory. That does not prove the process came from the Flocks backend. Any local script, test run, or other daemon started from the same repo/venv could match and be killed byflocks stop. The ownership check needs a stronger Flocks-specific signal, such as a backend-specific command signature, runtime artifact association, or another marker that ties the worker to the active Flocks service rather than to the checkout in general.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
flocks stoptarget discovery and force-kill refreshes.Evidence
On macOS, multiple stale workers were observed with PPID 1 and sustained CPU usage after the Flocks service was no longer reachable through the tracked backend pid tree:
lsof -p <pid>showed those processes had cwd/open files under a Flocks checkout and its virtualenv. Because they were reparented to PID 1 and did not own the backend port, the existing stop logic could miss them: it only gathered the tracked pid tree, runtime record pids, process group, and port listeners.This change keeps the cleanup narrow: a candidate must be a Unix PPID=1 Python multiprocessing spawn process, and
lsofmust show open files under the current Flocks repo,.venv, or state directory.Tests