Fix: Resolve watcher service deadlock during self-restart on folder deletion#1269
Fix: Resolve watcher service deadlock during self-restart on folder deletion#1269DeveloperAmrit wants to merge 1 commit into
Conversation
WalkthroughThis PR addresses a deadlock in the watcher service thread synchronization and updates OpenAPI schema documentation. The watcher changes prevent the worker thread from joining itself by detecting caller context and delegating internal restarts to daemon threads. Schema definitions are reordered for consistency. ChangesWatcher Thread Self-Restart Deadlock Prevention
OpenAPI Schema Documentation Updates
🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
sync-microservice/app/utils/watcher.py (1)
334-351: ⚡ Quick winClarify restart return semantics for the async path.
The function returns
Truewith two different meanings: immediately after scheduling (in-thread path) versus after actual completion (external path). The docstring and main caller inroutes/watcher.pytreat the return value as completion status ("Folder watcher restarted successfully"), but the in-thread path only schedules the restart asynchronously. This semantic mismatch should be resolved by updating the docstring to distinguish between scheduled and completed states, or add a note clarifying the return value is "True if restart completed (external) or scheduled (in-thread)".Proposed patch
def watcher_util_restart_folder_watcher() -> bool: """ Restart the folder watcher by stopping the current one and starting fresh. Returns: - True if restart was successful, False otherwise + True if restart completed (external call) or was successfully scheduled + (in-thread call), False otherwise """ logger.info("Restarting folder watcher...") if watcher_thread and threading.current_thread() == watcher_thread: # We are inside the watcher thread, so we can't join ourselves. # Launch a background thread to do the restart. def restart_worker(): watcher_util_stop_folder_watcher() watcher_util_start_folder_watcher() threading.Thread(target=restart_worker, daemon=True).start() + logger.info("Watcher restart scheduled on background thread") return True🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@sync-microservice/app/utils/watcher.py` around lines 334 - 351, The function watcher_util_restart_folder_watcher currently returns True both when the restart is scheduled asynchronously from inside the watcher thread and when it has completed from an external caller, causing ambiguous semantics for callers (e.g., routes/watcher.py). Update the watcher_util_restart_folder_watcher docstring to explicitly state the two return meanings: "True if restart completed when called externally, or True if restart was scheduled when called from inside the watcher thread (async)"; also mention that callers who need confirmation of completion should call watcher_util_stop_folder_watcher/watcher_util_start_folder_watcher directly or implement a synchronization mechanism. Reference watcher_util_restart_folder_watcher, watcher_util_stop_folder_watcher, watcher_util_start_folder_watcher and the caller routes/watcher.py in your change so readers can find the relevant logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@sync-microservice/app/utils/watcher.py`:
- Around line 343-351: Add an automated regression test that exercises the
self-restart path inside the watcher worker so we never hang on a self-join:
spawn a real watcher_thread (or simulate it via running the watcher worker
function on a new Thread), ensure threading.current_thread() == watcher_thread
when calling the restart logic, and invoke the code path that creates
restart_worker (the branch referencing watcher_thread and
threading.current_thread()). In the test, mock or patch
watcher_util_stop_folder_watcher and watcher_util_start_folder_watcher to record
calls, start the restart path from inside that watcher thread, and assert within
a short timeout that (a) the background restart thread was started, (b) mocked
stop/start functions were called, and (c) the test thread does not block (use a
join with timeout or an event to fail on hang). Target functions/symbols to
locate code: watcher_thread, restart_worker, watcher_util_stop_folder_watcher,
watcher_util_start_folder_watcher, and the branch that checks
threading.current_thread() == watcher_thread.
---
Nitpick comments:
In `@sync-microservice/app/utils/watcher.py`:
- Around line 334-351: The function watcher_util_restart_folder_watcher
currently returns True both when the restart is scheduled asynchronously from
inside the watcher thread and when it has completed from an external caller,
causing ambiguous semantics for callers (e.g., routes/watcher.py). Update the
watcher_util_restart_folder_watcher docstring to explicitly state the two return
meanings: "True if restart completed when called externally, or True if restart
was scheduled when called from inside the watcher thread (async)"; also mention
that callers who need confirmation of completion should call
watcher_util_stop_folder_watcher/watcher_util_start_folder_watcher directly or
implement a synchronization mechanism. Reference
watcher_util_restart_folder_watcher, watcher_util_stop_folder_watcher,
watcher_util_start_folder_watcher and the caller routes/watcher.py in your
change so readers can find the relevant logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a3f3ca87-e0f6-4f81-b9e1-9fafe54e788d
📒 Files selected for processing (2)
docs/backend/backend_python/openapi.jsonsync-microservice/app/utils/watcher.py
| if watcher_thread and threading.current_thread() == watcher_thread: | ||
| # We are inside the watcher thread, so we can't join ourselves. | ||
| # Launch a background thread to do the restart. | ||
| def restart_worker(): | ||
| watcher_util_stop_folder_watcher() | ||
| watcher_util_start_folder_watcher() | ||
|
|
||
| threading.Thread(target=restart_worker, daemon=True).start() | ||
| return True |
There was a problem hiding this comment.
Add an automated regression test for the self-restart path.
This is critical thread-lifecycle logic. Please add a test that triggers restart from inside the watcher worker path and verifies there is no self-join hang and restart completes.
As per coding guidelines, "Ensure that test code is automated, comprehensive, and follows testing best practices" and "Verify that all critical functionality is covered by tests."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@sync-microservice/app/utils/watcher.py` around lines 343 - 351, Add an
automated regression test that exercises the self-restart path inside the
watcher worker so we never hang on a self-join: spawn a real watcher_thread (or
simulate it via running the watcher worker function on a new Thread), ensure
threading.current_thread() == watcher_thread when calling the restart logic, and
invoke the code path that creates restart_worker (the branch referencing
watcher_thread and threading.current_thread()). In the test, mock or patch
watcher_util_stop_folder_watcher and watcher_util_start_folder_watcher to record
calls, start the restart path from inside that watcher thread, and assert within
a short timeout that (a) the background restart thread was started, (b) mocked
stop/start functions were called, and (c) the test thread does not block (use a
join with timeout or an event to fail on hang). Target functions/symbols to
locate code: watcher_thread, restart_worker, watcher_util_stop_folder_watcher,
watcher_util_start_folder_watcher, and the branch that checks
threading.current_thread() == watcher_thread.
Addressed Issues:
Fixes #1031
Screenshots/Recordings:
Additional Notes:
Description
This PR fixes a bug where the sync-microservice watcher service would freeze and eventually fail when attempting to restart itself after detecting a deleted folder.
Previously, when a folder was deleted, the background thread (watcher_thread) triggered a restart sequence that called watcher_util_stop_folder_watcher(). Inside that stop function, the thread attempted to execute watcher_thread.join(). Since it was joining itself, this caused a deadlock that hung for the 5-second timeout, preventing a clean restart and throwing a runtime error.
Changes made
How to test
AI Usage Disclosure:
We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact. AI slop is strongly discouraged and may lead to banning and blocking. Do not spam our repos with AI slop.
Check one of the checkboxes below:
I have used the following AI models and tools: TODO
Checklist
Summary by CodeRabbit
Documentation
Bug Fixes