-
Notifications
You must be signed in to change notification settings - Fork 103
Implement service name caching for UniquePodNames #1796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
when updating the operators on an existing/deployed env, the ctplane CR gets updated to reflect the existing serviceName via the webhook: |
**Problem:** When UniquePodNames is enabled, GetServiceName() uses instance.UID[:5] to generate unique service names. During CREATE webhook, UID is empty (assigned after mutating webhooks), causing panic: "slice bounds out of range [:5] with length 0". **Solution - Hybrid Approach:** 1. **Webhook handles CREATE and UPDATE**: - CREATE: Generate random 5-char hex ID (UID not available yet) - UPDATE: Look up existing service CR by owner reference, cache its name - Uses lib-common's object.CheckOwnerRefExist() for robust ownership checking 2. **Controller triggers webhook for operator upgrades**: - When UniquePodNames=true but ServiceName="", controller adds annotation - Generic EnsureWebhookTrigger() function sets annotation with timestamp - Defer block saves annotation change, which triggers UPDATE webhook - Webhook caches service name and removes annotation - Controller NEVER directly updates spec (follows Kubernetes best practices) **Key Implementation Details:** **GetServiceName() fix** (api/core/v1beta1/openstackcontrolplane_types.go): - Check if len(instance.UID) >= 5 before slicing - Return base name if UID unavailable (no panic) **Webhook caching** (api/core/v1beta1/openstackcontrolplane_webhook.go): - CacheServiceNameForCreate(): Uses random ID since UID unavailable - CacheServiceNameForUpdate(): Uses object.CheckOwnerRefExist(ownerUID, cr.GetOwnerReferences()) to find existing CR owned by this controlplane - Preserves existing CR name regardless of format (handles flips correctly) - Cleans up reconcile-trigger annotation after caching **Controller annotation trigger** (internal/openstack/cinder.go, glance.go): - If UniquePodNames && ServiceName="", call EnsureWebhookTrigger() - Sets annotation with timestamp value: metav1.Now().Format(time.RFC3339) - Returns Requeue=true, defer block saves annotation - No direct spec mutation (avoids GitOps conflicts, SSA issues) **Generic helper** (internal/openstack/common.go): - EnsureWebhookTrigger(ctx, instance, reason) for reusability - ReconcileTriggerAnnotation constant: "openstack.org/reconcile-trigger" - Can be used for other scenarios needing webhook triggers **Benefits:** - ✅ Consistent naming across reconciliations - ✅ Handles operator upgrades (existing CRs get names cached) - ✅ GitOps friendly (spec only mutated by webhook, not controller) - ✅ No SSA field ownership conflicts, if we'd call update in the controller if setting the spec. - ✅ Robust flip detection via owner references - ✅ Works with existing environments (no manual CR updates needed) - ✅ Generic annotation mechanism for future use cases **Testing:** - Added comprehensive tests for all scenarios: - CREATE with UniquePodNames (uses random ID) - UPDATE with UniquePodNames (caches from existing CR or generates) - Operator upgrade scenario (controller triggers webhook) - Flip scenarios (UniquePodNames false→true, true→false) - Existing CR discovery by owner reference Co-Authored-By: Claude Sonnet 4.5 <[email protected]> Signed-off-by: Martin Schuppert <[email protected]>
875f1ca to
9173588
Compare
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: stuggi The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| altName := fmt.Sprintf("%s-%s", name, instance.UID[:5]) | ||
| // Generate UID suffix only if UID is available and has sufficient length | ||
| var uidSuffix string | ||
| if len(instance.UID) >= 5 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc this check will always be true because:
- k8s generates the uuid at server side [1
- as per [2] it should be wrapped into 32 chars
So this function can be simplified further because L1052 is not necessary.
@stuggi not sure I'm missing something or a corner case I'm not considering though.
[1] https://github.com/kubernetes/apimachinery/blob/master/pkg/util/uuid/uuid.go#L25
[2] https://github.com/google/uuid/blob/master/version4.go#L47
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a new CR gets created, it gets its metadata.uid during admission in the API server, just before it is persisted in etcd, not when the controller first sees it. Thesefore it's not available on ne deployment.
Problem:
When UniquePodNames is enabled, GetServiceName() uses instance.UID[:5] to generate unique service names. During CREATE webhook, UID is empty (assigned after mutating webhooks), causing panic: "slice bounds out of range [:5] with length 0".
Solution - Hybrid Approach:
Webhook handles CREATE and UPDATE:
Controller triggers webhook for operator upgrades:
Key Implementation Details:
GetServiceName() fix (api/core/v1beta1/openstackcontrolplane_types.go):
Webhook caching (api/core/v1beta1/openstackcontrolplane_webhook.go):
Controller annotation trigger (internal/openstack/cinder.go, glance.go):
Generic helper (internal/openstack/common.go):
Benefits:
Testing: