Skip to content

Hot snapshot clone fails when baseimage annotation is a bare OCI ref #38

@tonicmuroq

Description

@tonicmuroq

Summary

Cloning a hot snapshot can fail when the snapshot manifest's cocoonstack.snapshot.baseimage annotation contains a bare OCI ref such as simular/win10:22h2-20260510.

In our case both the hot snapshot and the referenced base image exist in epoch, but the cocoon node does not have the base qcow2 blob locally. During clone, cocoon tries to auto-pull the bare ref directly and fails with unsupported protocol scheme "", then restore fails because the backing qcow2 file is missing locally.

Environment

  • Cocoon cluster: cocoonset-gke, node pool running vk-cocoon
  • vm-service env: testing
  • Requested hot snapshot: simular/win10-hot-testing:v1-20260511
  • Hot snapshot manifest: https://epoch.simular.cloud/v2/simular/win10-hot-testing/manifests/v1-20260511 returns HTTP 200
  • Referenced base image manifest: https://epoch.simular.cloud/v2/simular/win10/manifests/22h2-20260510 returns HTTP 200

Hot snapshot manifest annotation:

{
  "cocoonstack.snapshot.baseimage": "simular/win10:22h2-20260510",
  "org.opencontainers.image.created": "2026-05-11T01:42:12Z"
}

Repro

Create/clone a VM from this hot snapshot on a node where the referenced base image is not already present in the local cocoon image cache:

simular/win10-hot-testing:v1-20260511

This was triggered through vm-service's cocoon provider, which created a CocoonSet for a VM named vm-3110f674.

Actual Result

Clone fails during restore:

base image not found locally, pulling simular/win10:22h2-20260510 ...
auto-pull simular/win10:22h2-20260510 failed (imported image?): http get simular/win10:22h2-20260510: Get "simular/win10:22h2-20260510": unsupported protocol scheme "" — clone may fail if base layers are missing
Error: clone VM: vm.restore: PUT http://localhost/api/v1/vm.restore → 500: ["Error from API","The VM could not be restored","Error from device manager","Failed to create QcowDiskAsync","I/O error (path=/var/lib/cocoon/run/cloudhypervisor/4TYF3AE7W2UW7HYC2KNJMC3MBF/overlay.qcow2 op=open)","Backing file I/O error: /var/lib/cocoon/cloudimg/blobs/5c305d620264d10bafd4fb806d3025a146c95a03cffe3372d9a3982229277f55.qcow2","No such file or directory (os error 2)"]

The base image manifest is present in epoch, so this does not appear to be a missing registry tag. It looks like the auto-pull path cannot resolve/pull the bare baseimage ref, and then restore proceeds far enough to require the local backing blob.

Expected Result

One of these should happen:

  1. Cocoon resolves bare repo:tag baseimage annotations using the same registry context as the hot snapshot being cloned, or
  2. Cocoon requires/stores fully-qualified baseimage annotations and emits a clear validation error before restore, or
  3. Clone fails early with an actionable error explaining that the base image must be pre-pulled or fully qualified.

Notes

A fully-qualified base image ref such as epoch.simular.cloud/simular/win10:22h2-20260510 may avoid this, but the current hot snapshot was produced with the bare annotation simular/win10:22h2-20260510.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions