Add secret rotation tracking with composite hash drift detection#1781
Add secret rotation tracking with composite hash drift detection#1781lmiccini wants to merge 1 commit intoopenstack-k8s-operators:mainfrom
Conversation
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/56ac80bd0e7547ad88350eb0206886b5 ✔️ openstack-k8s-operators-content-provider SUCCESS in 3h 18m 47s |
04cc55a to
1698305
Compare
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/db62c9cd33b34a538c7eccf243769b6a ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 02m 26s |
3885c4a to
c1fe8f8
Compare
|
Build failed (check pipeline). Post https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/b5d3972863e64857b2da5055f867ef55 ✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 20m 43s |
|
/retest |
|
recheck |
|
/test openstack-operator-build-deploy-kuttl-4-18 |
cbfbb7c to
f52529a
Compare
f52529a to
017d2ca
Compare
|
/test functional |
97bb482 to
b1d9350
Compare
|
/test openstack-operator-build-deploy-kuttl-4-18 |
|
I think this change actually encompasses several pieces of functionality that could be expressed as different features. Each of which probably need a good design and discussion how we want them to work. A couple I can see right off:
I feel like we'll short-circuited some of the design/acceptance/testing cycle if we merge this. This is a big change to the NodeSet type and controller, which is an area edpm is largely responsible for, but I don't feel like that team has a good handle on taking ownership of this functionality (testing, docs, etc). Can't we just say to users to do a deployment across all nodesets without ansible limit before deleting old credentials? I'm not sure that's something we need to solve in code right away. |
|
Absolutely, right now we are documenting how to check and eventually remove the rotated users, so we are not in a rush to merge this as is. |
cd2cef1 to
490e443
Compare
490e443 to
3157e2b
Compare
OpenStackControlPlane CRD Size Report
Threshold reference
|
8db6b16 to
d974015
Compare
d974015 to
45f39f8
Compare
5b509f3 to
5880a65
Compare
|
/test openstack-operator-build-deploy-kuttl-4-18 |
5880a65 to
187eecc
Compare
Track which nodes in a nodeset have been deployed with the latest secret
versions, enabling safe credential rotation by blocking deletion of old
credentials until all nodes are updated.
Design:
- Compute a composite SHA-256 hash from all tracked secrets, sorted by name
for determinism, using lib-common's secret.Hash() for content-based hashing
- Store the deployed node list in a ConfigMap (<nodeset>-secret-tracking)
to avoid bloating CR status for large nodesets. The ConfigMap contains
{deployedSecretHash, deployedNodes[]}
- Detect drift on every reconcile by fetching tracked secrets from the cluster
and comparing the composite hash against DeployedSecretHash in status. On
mismatch, set AllNodesUpdated=false and UpdatedNodes=0 immediately
- Accumulate nodes across partial deployments using AnsibleLimit: when the
deployment hash matches the stored hash, nodes are appended; when the hash
differs (secrets changed), the node list resets
- Skip stale deployments whose secret hashes no longer match the cluster state
- Prune removed nodes by intersecting DeployedNodes with current Spec.Nodes
SecretDeploymentStatus fields in NodeSet status:
- AllNodesUpdated: all nodes deployed with current secrets (consumed by
infra-operator's RabbitMQUser controller to block credential deletion)
- TotalNodes / UpdatedNodes: node counts for observability
- DeployedSecretHash: composite hash at last successful deployment
- LastUpdateTime: timestamp of last status update
API exports:
- GetSecretTrackingConfigMapName(nodesetName) returns the deterministic
ConfigMap name for a given nodeset, allowing consumers to reference it
Watch configuration:
- Secret watch uses ResourceVersionChangedPredicate to avoid reconciling
on metadata-only updates
- Field index on status.secretHashes.keys enables O(1) lookup of nodesets
tracking a given secret, replacing O(secrets * nodesets) list scans
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
187eecc to
9f7cae7
Compare
|
/test openstack-operator-build-deploy-kuttl-4-18 |
|
@lmiccini: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/test openstack-operator-build-deploy-kuttl-4-18 |
|
Why can't infra-operator block deletion of there is mismatch between latest deployment and nodeset secret hashes? I am not sure if we should go this path as it is othrogonal to the original design. Also, users should be discouraged to do secret rotations using ansible_limits. It is there for scenarios where nodes are not reachable for some reason and users are expected to do full deployment across all nodes of the nodeset eventually to keep things in sync. |
Thanks @rabi . One of the challenges with that approach is that the last deployment may not be representative of the overall status of the nodeset. AFAICU we would have to introspect the deployments to check whether ansible_limits was used and/or if only a subset of the services were deployed, and repeat this pattern not just in infra but other operators that need to know if the dataplane is in sync (keystone for appcred, possibly others?). |
Track which nodes in a nodeset have been deployed with the latest secret
versions, enabling safe credential rotation by blocking deletion of old
credentials until all nodes are updated.
Design:
for determinism, using lib-common's secret.Hash() for content-based hashing
to avoid bloating CR status for large nodesets. The ConfigMap contains
{deployedSecretHash, deployedNodes[]}
and comparing the composite hash against DeployedSecretHash in status. On
mismatch, set AllNodesUpdated=false and UpdatedNodes=0 immediately
deployment hash matches the stored hash, nodes are appended; when the hash
differs (secrets changed), the node list resets
SecretDeploymentStatus fields in NodeSet status:
infra-operator's RabbitMQUser controller to block credential deletion)