Skip to content

feat: implement config backup controller#321

Open
elinalin wants to merge 1 commit intomainfrom
feat/implement-config-backup-controller
Open

feat: implement config backup controller#321
elinalin wants to merge 1 commit intomainfrom
feat/implement-config-backup-controller

Conversation

@elinalin
Copy link
Copy Markdown
Contributor

@elinalin elinalin commented Apr 21, 2026

Summary

This PR introduces the ConfigBackup controller for on-device configuration backups.

The controller supports:

  • type: Local for timestamped backups written to device-local storage
  • type: Startup for persisting the running configuration as startup-config

The initial provider implementation targets Cisco NX-OS.

What Is Included

  • Added the ConfigBackup CRD with spec and status
  • Implemented the ConfigBackup controller reconcile flow
  • Added provider interfaces for config backup operations
  • Implemented NX-OS provider support for:
    • local filesystem backups
    • startup-config backups via NX-API
    • backup inventory discovery
    • storage statistics collection
    • retention-based rotation for local backups
  • Added status handling, conditions, metrics, and event recording
  • Added documentation and usage examples
  • Added a reproducible local E2E example for startup backup validation

Validation Performed

I validated this change at multiple levels.

1. Controller and provider tests

I ran the controller and provider-side tests to validate:

  • reconciliation flow
  • scheduling behavior
  • retention/rotation behavior
  • storage-threshold handling
  • provider integration
  • startup and local backup logic

2. Live device validation for local backup behavior

On the lab device, I verified the local backup path directly:

  • creating a backup file on device-local storage
  • listing the backup from the device filesystem
  • deleting the backup successfully

This validated the device-side behavior for type: Local.

3. Full local E2E validation for startup backup

I also validated the full local end-to-end flow for type: Startup:

  • ran the operator locally
  • used a local kind cluster
  • applied Kubernetes resources including:
    • Device
    • ConfigBackupConfig
    • ConfigBackup
  • forwarded gNMI and NX-API ports from the lab device to the local machine
  • verified that:
    • Device reconciled to Running / Reachable=True
    • ConfigBackup reconciled to Ready=True
    • status.lastBackup.location was set to startup-config
    • a BackupCompleted event was recorded
  • read back the device startup-config over NX-API and confirmed the saved timestamp

This addresses the previously missing local operator E2E validation for the startup backup path.

Notes

  • For NX-OS, type: Startup requires NX-API because gNMI cannot execute copy running-config startup-config.
  • To support local E2E validation where gNMI and NX-API are exposed on different ports, this PR adds an NX-OS-specific ConfigBackupConfig that can override the NX-API address for ConfigBackup.

Example E2E Assets

  • examples/configbackup-startup-local-e2e/manifest.yaml
  • examples/configbackup-startup-local-e2e/README.md

@elinalin elinalin self-assigned this Apr 21, 2026
@elinalin elinalin requested a review from a team as a code owner April 21, 2026 08:11
@elinalin elinalin changed the title Feat/implement config backup controller feat: implement config backup controller Apr 21, 2026
@elinalin elinalin force-pushed the feat/implement-config-backup-controller branch 2 times, most recently from 0a41b4c to 75b5112 Compare April 21, 2026 09:02
@elinalin elinalin requested a review from nikatza April 21, 2026 09:09
@hardikdr hardikdr added the area/switch-automation Automation processes for network switch management and operations. label Apr 22, 2026
@hardikdr hardikdr added this to Roadmap Apr 22, 2026
Copy link
Copy Markdown
Contributor

@felix-kaestner felix-kaestner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elinalin did you even test your changes at all? If so, how? When I run the operator with these changes I get a clear

ConfigBackup.networking.metal.ironcore.dev", "error": "no matches for kind \"ConfigBackup\" in version \"networking.metal.ironcore.dev/v1alpha1\""

Meaning that the operator won't even start or doesn't know this api type. Again, because the scaffolding from kubebuilder is missing.

Comment thread api/core/v1alpha1/configbackup_types.go
@elinalin
Copy link
Copy Markdown
Contributor Author

@elinalin did you even test your changes at all? If so, how? When I run the operator with these changes I get a clear

ConfigBackup.networking.metal.ironcore.dev", "error": "no matches for kind \"ConfigBackup\" in version \"networking.metal.ironcore.dev/v1alpha1\""

Meaning that the operator won't even start or doesn't know this api type. Again, because the scaffolding from kubebuilder is missing.

I did test the controller/provider logic, including real NX-OS Local backup behavior(the lab container) , but you are right that I missed the fresh deployment/install path.

I checked the code after your comment:

  • the Go type is registered in the scheme
  • the controller is wired into the manager
  • the CRD base exists in the repo

So this is not missing Go-side kubebuilder scaffolding.

The real issue is that the ConfigBackup CRD is not being delivered/installed in the deployment path you used, so the API server does not recognize the kind and the operator cannot start cleanly against that cluster.

I’ll fix the install artifacts and validate again from a fresh deployment.

@elinalin elinalin force-pushed the feat/implement-config-backup-controller branch 4 times, most recently from 19072ee to 7c639df Compare April 24, 2026 01:01
@elinalin elinalin force-pushed the feat/implement-config-backup-controller branch from 7c639df to eef68d9 Compare April 24, 2026 01:54
@github-actions
Copy link
Copy Markdown

Merging this branch changes the coverage (1 decrease, 2 increase)

Impacted Packages Coverage Δ 🤖
github.com/ironcore-dev/network-operator/api/cisco/nx/v1alpha1 0.00% (ø)
github.com/ironcore-dev/network-operator/api/core/v1alpha1 0.00% (ø)
github.com/ironcore-dev/network-operator/cmd 0.00% (ø)
github.com/ironcore-dev/network-operator/internal/controller/core 63.65% (-0.03%) 👎
github.com/ironcore-dev/network-operator/internal/provider 56.00% (+4.00%) 👍
github.com/ironcore-dev/network-operator/internal/provider/cisco/nxos 14.93% (+4.77%) 👍

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/ironcore-dev/network-operator/api/cisco/nx/v1alpha1/configbackupconfig_types.go 0.00% (ø) 0 0 0
github.com/ironcore-dev/network-operator/api/cisco/nx/v1alpha1/zz_generated.deepcopy.go 0.00% (ø) 0 0 0
github.com/ironcore-dev/network-operator/api/core/v1alpha1/configbackup_types.go 0.00% (ø) 0 0 0
github.com/ironcore-dev/network-operator/api/core/v1alpha1/groupversion_info.go 0.00% (ø) 0 0 0
github.com/ironcore-dev/network-operator/api/core/v1alpha1/zz_generated.deepcopy.go 0.00% (ø) 0 0 0
github.com/ironcore-dev/network-operator/cmd/main.go 0.00% (ø) 0 0 0
github.com/ironcore-dev/network-operator/internal/controller/core/configbackup_controller.go 62.12% (+62.12%) 293 (+293) 182 (+182) 111 (+111) 🌟
github.com/ironcore-dev/network-operator/internal/controller/core/configbackup_metrics.go 100.00% (+100.00%) 1 (+1) 1 (+1) 0 🌟
github.com/ironcore-dev/network-operator/internal/provider/cisco/nxos/configbackup.go 76.34% (+76.34%) 186 (+186) 142 (+142) 44 (+44) 🌟
github.com/ironcore-dev/network-operator/internal/provider/cisco/nxos/provider.go 0.06% (-0.00%) 1686 (+3) 1 1685 (+3) 👎
github.com/ironcore-dev/network-operator/internal/provider/configbackup.go 0.00% (ø) 0 0 0

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/ironcore-dev/network-operator/internal/controller/core/configbackup_controller_test.go
  • github.com/ironcore-dev/network-operator/internal/controller/core/suite_test.go
  • github.com/ironcore-dev/network-operator/internal/provider/cisco/nxos/configbackup_test.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/switch-automation Automation processes for network switch management and operations. size/XXL

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants