Skip to content

Preserve snapshot fork restore paths#239

Merged
sjmiller609 merged 7 commits into
mainfrom
hypeship/fix-fc-snapshot-forks
May 22, 2026
Merged

Preserve snapshot fork restore paths#239
sjmiller609 merged 7 commits into
mainfrom
hypeship/fix-fc-snapshot-forks

Conversation

@sjmiller609
Copy link
Copy Markdown
Collaborator

@sjmiller609 sjmiller609 commented May 22, 2026

Summary

  • add a Firecracker regression for snapshot fork -> running fork after deleting the original snapshot source
  • retain Firecracker snapshot source aliases when the original source dir is already gone
  • preserve retained upstream aliases across later snapshot forks and network rewrites
  • add the same warm fork chain coverage for QEMU and Cloud Hypervisor
  • persist QEMU restore configs so later standby snapshots do not copy stale source paths
  • avoid rotating the restored source CID when the fork target is stopped and no live/snapshotted fork can collide
  • refresh stopped fork CIDs after removing their standby snapshot
  • return Firecracker source stat errors instead of treating them as usable existing paths

Test plan

  • go test ./lib/hypervisor/firecracker -run 'TestPrepareFork|TestWithSnapshotSourceDirAlias' -count=1 -v\n- go test ./lib/hypervisor/qemu -count=1\n- go test ./lib/hypervisor/qemu ./lib/hypervisor/firecracker -run 'TestPrepareFork|TestWithSnapshotSourceDirAlias' -count=1\n- go test ./lib/instances -run 'TestRotateSourceVsockForRestore|TestResolveForkTargetState' -count=1\n- go test ./lib/instances -run TestApplyForkTargetStateStoppedRefreshesSnapshotForkCID -count=1 -v\n- go test ./lib/instances -run TestFirecrackerWarmForkChain -count=1 -v\n- go test ./lib/instances -run 'TestQEMUWarmForkChain|TestCloudHypervisorWarmForkChain' -count=1 -v (local environment reached Docker Hub anonymous pull limits before guest creation)\n- go test ./lib/instances -run '^$' -count=1\n- go test ./lib/hypervisor/firecracker ./lib/instances -count=1 (local environment did not complete: bridge creation is not permitted here, local disk is below the reservation some tests need, and Docker Hub anonymous pulls hit rate limits)

namePrefix string
}

func runWarmForkChain(t *testing.T, mgr *manager, tmpDir string, cfg warmForkChainConfig) {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is testing the flow:

Fork from snapshot -> Fork again

@sjmiller609 sjmiller609 changed the title Preserve Firecracker snapshot source aliases Preserve snapshot fork restore paths May 22, 2026
@sjmiller609 sjmiller609 requested a review from rgarcia May 22, 2026 14:28
@sjmiller609 sjmiller609 marked this pull request as ready for review May 22, 2026 14:28
@sjmiller609 sjmiller609 requested a review from hiroTamada May 22, 2026 14:28
@firetiger-agent
Copy link
Copy Markdown

Firetiger deploy monitoring skipped

This PR didn't match the auto-monitor filter configured on your GitHub connection:

Any PR that changes the kernel API. Monitor changes to API endpoints (packages/api/cmd/api/) and Temporal workflows (packages/api/lib/temporal) in the kernel repo

Reason: PR changes hypervisor packages (firecracker, qemu, cloud hypervisor) and instances library, not the kernel API endpoints or Temporal workflows specified in the filter.

To monitor this PR anyway, reply with @firetiger monitor this.

Copy link
Copy Markdown
Contributor

@rgarcia rgarcia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved — implementation direction looks good. a couple things worth checking:

  • lib/instances/fork.go:71 — skipping rotateSourceVsockForRestore for targetState=Stopped leaves the stopped fork with the same VsockCID as the restored source. forkInstanceFromStoppedOrStandby copies the standby source CID into the fork metadata, and the StateStandby -> StateStopped transition only deletes the snapshot, so starting that stopped fork later can collide with the still-running source CID. either keep rotating the source or assign the stopped fork a fresh CID before returning it.
  • lib/hypervisor/firecracker/fork.go:57 — consider handling non-IsNotExist os.Stat errors explicitly; right now permission/io errors are treated as “source exists”, which can set retainAlias=false and later clear alias metadata even though the source path may not actually be usable.

@sjmiller609 sjmiller609 merged commit 2b58e51 into main May 22, 2026
11 checks passed
@sjmiller609 sjmiller609 deleted the hypeship/fix-fc-snapshot-forks branch May 22, 2026 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants