<fix>[ha]: defer skip-trace list cleanup on MN departure to prevent split-brain#3845
<fix>[ha]: defer skip-trace list cleanup on MN departure to prevent split-brain#3845MatheMatrix wants to merge 3 commits into5.4.8from
Conversation
…plit-brain When a management node departs, its VM skip-trace entries were immediately removed. If VMs were still being started by kvmagent, the next VM sync would falsely detect them as Stopped and trigger HA, causing split-brain. Fix: transfer departed MN skip-trace entries to an orphaned set with 10-minute TTL instead of immediate deletion. VMs in the orphaned set remain skip-traced until the TTL expires or they are explicitly continued, preventing false HA triggers during MN restart scenarios. Resolves: ZSTAC-80821 Change-Id: I3222e260b2d7b33dc43aba0431ce59a788566b34 Conflicts: plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java Conflicts: plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java
…anup Resolves: ZSTAC-80821 Change-Id: I59284c4e69f5d2ee357b1836b7c243200e30949a
Resolves: ZSTAC-80821 Change-Id: Ia9a9597feceb96b3e6e22259e2d0be7bde8ae499
概述为 KVM 虚拟机管理引入了孤立虚拟机跳过超时配置参数,并实现了基于 TTL 的孤立虚拟机跟踪机制,在管理节点离开时保留跳过状态,待超时后自动清理并恢复追踪。 更改内容
序列图sequenceDiagram
participant MN as 管理节点
participant Task as KvmVmSyncPingTask
participant Host as KVM 主机
participant State as 孤立虚拟机状态<br/>(orphanedSkipVms)
MN->>Task: 节点离开事件 nodeLeft()
Task->>State: 转移跳过的 VM UUID<br/>+ 时间戳
State-->>Task: 保存孤立 VM 映射
loop 每个 syncVm 周期
Task->>State: 检查过期条目
State->>State: 清理 TTL 已过期的项
Task->>Task: 获取 vmsToSkip + orphanedSkipVms
Task->>Host: 构建 vmsToSkipSetHostSide<br/>(包含孤立 VM)
Host-->>Task: 同步响应
Task->>Task: 检查 isVmDoNotNeedToTrace()
Task-->>Task: 孤立 VM 返回 true<br/>(不追踪)
end
Note over State,Task: TTL 过期后
State->>State: 移除过期孤立 VM
Task->>Task: 恢复对 VM 的追踪<br/>(isVmDoNotNeedToTrace 返回 false)
代码审查工作量🎯 3 (中等) | ⏱️ ~20 分钟 庆祝诗
Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning)
✅ Passed checks (3 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java`:
- Around line 468-475: The orphanedSkipVms entries are never removed on an
explicit "continue trace" path, so add code in the VM_CONTINUE_TRACE_PATH
handler (the continuation-trace callback used by KvmVmSyncPingTask) to also
remove the VM from orphanedSkipVms in addition to clearing vmsToSkip;
specifically update the continuation handling logic that currently clears
vmsToSkip (and/or vmsToSkipSetHostSide) to call orphanedSkipVms.remove(vmUuid)
so syncVm()/vmsToSkipSetHostSide won't re-add or re-skip the VM after a manual
continuation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)
Review profile: CHILL
Plan: Pro
Run ID: 45cfccc4-3e1e-45ef-80cd-5cf119c86705
📒 Files selected for processing (2)
plugin/kvm/src/main/java/org/zstack/kvm/KVMGlobalConfig.javaplugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java
| Set<String> skippedVms = vmsToSkip.remove(inv.getUuid()); | ||
| if (skippedVms != null && !skippedVms.isEmpty()) { | ||
| long now = System.currentTimeMillis(); | ||
| for (String vmUuid : skippedVms) { | ||
| orphanedSkipVms.put(vmUuid, now); | ||
| logger.info(String.format("moved VM[uuid:%s] from departed MN[uuid:%s] skip list to orphaned set" + | ||
| " (will expire in %d minutes)", vmUuid, inv.getUuid(), getOrphanTtlMs() / 60000)); | ||
| } |
There was a problem hiding this comment.
孤儿 skip 条目缺少“显式继续追踪”的回收路径。
这里把 VM 转入 orphanedSkipVms 之后,当前实现只会在 TTL 到期时删除它;VM_CONTINUE_TRACE_PATH 的处理逻辑仍然只清 vmsToSkip。这样即使启动流程已经完成,后续 syncVm() 仍会把该 VM 放进 vmsToSkipSetHostSide,最多额外屏蔽一个 vm.orphanedSkipTimeout 窗口的 HA/状态收敛,这和 PR 目标里的“TTL 到期或显式 continuation”不一致。建议在 continue-trace 回调里同步 orphanedSkipVms.remove(vmUuid)。
💡 建议修改
evtf.on(VmTracerCanonicalEvents.VM_CONTINUE_TRACE_PATH, new EventCallback() {
`@Override`
protected void run(Map tokens, Object data) {
VmTracerCanonicalEvents.VmContinueTraceData data1
= (VmTracerCanonicalEvents.VmContinueTraceData) data;
+ String continuedVmUuid = data1.getVmUuid();
+
if (data1.getApiId() != null && vmApis.containsKey(data1.getManagementNodeId()) && vmApis.get(data1.getManagementNodeId()).contains(data1.getApiId())) {
- String vmUuid = vmApis.get(data1.getManagementNodeId()).remove(data1.getApiId());
- logger.info("Continuing tracing VM: " + vmUuid);
- vmsToSkip.get(data1.getManagementNodeId()).remove(vmUuid);
- return;
+ continuedVmUuid = vmApis.get(data1.getManagementNodeId()).remove(data1.getApiId());
}
- if (data1.getVmUuid() != null) {
- logger.info("Continuing tracing VM: " + data1.getVmUuid());
- vmsToSkip.get(data1.getManagementNodeId()).remove(data1.getVmUuid());
+ if (continuedVmUuid != null) {
+ logger.info("Continuing tracing VM: " + continuedVmUuid);
+ orphanedSkipVms.remove(continuedVmUuid);
+ Set<String> skipped = vmsToSkip.get(data1.getManagementNodeId());
+ if (skipped != null) {
+ skipped.remove(continuedVmUuid);
+ }
}
}
});🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java` around lines
468 - 475, The orphanedSkipVms entries are never removed on an explicit
"continue trace" path, so add code in the VM_CONTINUE_TRACE_PATH handler (the
continuation-trace callback used by KvmVmSyncPingTask) to also remove the VM
from orphanedSkipVms in addition to clearing vmsToSkip; specifically update the
continuation handling logic that currently clears vmsToSkip (and/or
vmsToSkipSetHostSide) to call orphanedSkipVms.remove(vmUuid) so
syncVm()/vmsToSkipSetHostSide won't re-add or re-skip the VM after a manual
continuation.
When a management node departs, its VM skip-trace entries were
immediately removed. If VMs were still being started by kvmagent,
the next VM sync would falsely detect them as Stopped and trigger
HA, causing split-brain.
Fix: transfer departed MN skip-trace entries to an orphaned set with
10-minute TTL instead of immediate deletion. VMs in the orphaned set
remain skip-traced until the TTL expires or they are explicitly
continued, preventing false HA triggers during MN restart scenarios.
Resolves: ZSTAC-80821
Change-Id: I3222e260b2d7b33dc43aba0431ce59a788566b34
Conflicts:
plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java
Conflicts:
plugin/kvm/src/main/java/org/zstack/kvm/KvmVmSyncPingTask.java
sync from gitlab !9716