Skip to content

Conversation

@stuggi
Copy link
Contributor

@stuggi stuggi commented Feb 4, 2026

Problem:
When UniquePodNames is enabled, GetServiceName() uses instance.UID[:5] to generate unique service names. During CREATE webhook, UID is empty (assigned after mutating webhooks), causing panic: "slice bounds out of range [:5] with length 0".

Solution - Hybrid Approach:

  1. Webhook handles CREATE and UPDATE:

    • CREATE: Generate random 5-char hex ID (UID not available yet)
    • UPDATE: Look up existing service CR by owner reference, cache its name
    • Uses lib-common's object.CheckOwnerRefExist() for robust ownership checking
  2. Controller triggers webhook for operator upgrades:

    • When UniquePodNames=true but ServiceName="", controller adds annotation
    • Generic EnsureWebhookTrigger() function sets annotation with timestamp
    • Defer block saves annotation change, which triggers UPDATE webhook
    • Webhook caches service name and removes annotation
    • Controller NEVER directly updates spec (follows Kubernetes best practices)

Key Implementation Details:

GetServiceName() fix (api/core/v1beta1/openstackcontrolplane_types.go):

  • Check if len(instance.UID) >= 5 before slicing
  • Return base name if UID unavailable (no panic)

Webhook caching (api/core/v1beta1/openstackcontrolplane_webhook.go):

  • CacheServiceNameForCreate(): Uses random ID since UID unavailable
  • CacheServiceNameForUpdate(): Uses object.CheckOwnerRefExist(ownerUID, cr.GetOwnerReferences()) to find existing CR owned by this controlplane
  • Preserves existing CR name regardless of format (handles flips correctly)
  • Cleans up reconcile-trigger annotation after caching

Controller annotation trigger (internal/openstack/cinder.go, glance.go):

  • If UniquePodNames && ServiceName="", call EnsureWebhookTrigger()
  • Sets annotation with timestamp value: metav1.Now().Format(time.RFC3339)
  • Returns Requeue=true, defer block saves annotation
  • No direct spec mutation (avoids GitOps conflicts, SSA issues)

Generic helper (internal/openstack/common.go):

  • EnsureWebhookTrigger(ctx, instance, reason) for reusability
  • ReconcileTriggerAnnotation constant: "openstack.org/reconcile-trigger"
  • Can be used for other scenarios needing webhook triggers

Benefits:

  • ✅ Consistent naming across reconciliations
  • ✅ Handles operator upgrades (existing CRs get names cached)
  • ✅ GitOps friendly (spec only mutated by webhook, not controller)
  • ✅ No SSA field ownership conflicts, if we'd call update in the controller if setting the spec.
  • ✅ Robust flip detection via owner references
  • ✅ Works with existing environments (no manual CR updates needed)
  • ✅ Generic annotation mechanism for future use cases

Testing:

  • Added comprehensive tests for all scenarios:
    • CREATE with UniquePodNames (uses random ID)
    • UPDATE with UniquePodNames (caches from existing CR or generates)
    • Operator upgrade scenario (controller triggers webhook)
    • Flip scenarios (UniquePodNames false→true, true→false)
    • Existing CR discovery by owner reference

@stuggi
Copy link
Contributor Author

stuggi commented Feb 4, 2026

when updating the operators on an existing/deployed env, the ctplane CR gets updated to reflect the existing serviceName via the webhook:

2026-02-04T13:02:20Z    INFO    openstackcontrolplane-resource  Defaulting for OpenStackControlPlane    {"name": "controlplane"}
2026-02-04T13:02:20Z    INFO    openstackcontrolplane-resource  default {"name": "controlplane"}
2026-02-04T13:02:20Z    INFO    openstackcontrolplane-resource  Cached Cinder service name      {"serviceName": "cinder-3c8d9", "isCreate": false}
2026-02-04T13:02:20Z    INFO    openstackcontrolplane-resource  Cached Glance service name      {"serviceName": "glance-3c8d9", "isCreate": false}
2026-02-04T13:02:20Z    INFO    openstackcontrolplane-resource  Removed reconcile-trigger annotation after caching service name
$ oc get osctlplane -o yaml | grep serviceName:
      serviceName: cinder-3c8d9
      serviceName: glance-3c8d9

**Problem:**
When UniquePodNames is enabled, GetServiceName() uses instance.UID[:5] to generate
unique service names. During CREATE webhook, UID is empty (assigned after mutating
webhooks), causing panic: "slice bounds out of range [:5] with length 0".

**Solution - Hybrid Approach:**

1. **Webhook handles CREATE and UPDATE**:
   - CREATE: Generate random 5-char hex ID (UID not available yet)
   - UPDATE: Look up existing service CR by owner reference, cache its name
   - Uses lib-common's object.CheckOwnerRefExist() for robust ownership checking

2. **Controller triggers webhook for operator upgrades**:
   - When UniquePodNames=true but ServiceName="", controller adds annotation
   - Generic EnsureWebhookTrigger() function sets annotation with timestamp
   - Defer block saves annotation change, which triggers UPDATE webhook
   - Webhook caches service name and removes annotation
   - Controller NEVER directly updates spec (follows Kubernetes best practices)

**Key Implementation Details:**

**GetServiceName() fix** (api/core/v1beta1/openstackcontrolplane_types.go):
- Check if len(instance.UID) >= 5 before slicing
- Return base name if UID unavailable (no panic)

**Webhook caching** (api/core/v1beta1/openstackcontrolplane_webhook.go):
- CacheServiceNameForCreate(): Uses random ID since UID unavailable
- CacheServiceNameForUpdate(): Uses object.CheckOwnerRefExist(ownerUID, cr.GetOwnerReferences())
  to find existing CR owned by this controlplane
- Preserves existing CR name regardless of format (handles flips correctly)
- Cleans up reconcile-trigger annotation after caching

**Controller annotation trigger** (internal/openstack/cinder.go, glance.go):
- If UniquePodNames && ServiceName="", call EnsureWebhookTrigger()
- Sets annotation with timestamp value: metav1.Now().Format(time.RFC3339)
- Returns Requeue=true, defer block saves annotation
- No direct spec mutation (avoids GitOps conflicts, SSA issues)

**Generic helper** (internal/openstack/common.go):
- EnsureWebhookTrigger(ctx, instance, reason) for reusability
- ReconcileTriggerAnnotation constant: "openstack.org/reconcile-trigger"
- Can be used for other scenarios needing webhook triggers

**Benefits:**
- ✅ Consistent naming across reconciliations
- ✅ Handles operator upgrades (existing CRs get names cached)
- ✅ GitOps friendly (spec only mutated by webhook, not controller)
- ✅ No SSA field ownership conflicts, if we'd call update in the
  controller if setting the spec.
- ✅ Robust flip detection via owner references
- ✅ Works with existing environments (no manual CR updates needed)
- ✅ Generic annotation mechanism for future use cases

**Testing:**
- Added comprehensive tests for all scenarios:
  - CREATE with UniquePodNames (uses random ID)
  - UPDATE with UniquePodNames (caches from existing CR or generates)
  - Operator upgrade scenario (controller triggers webhook)
  - Flip scenarios (UniquePodNames false→true, true→false)
  - Existing CR discovery by owner reference

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Signed-off-by: Martin Schuppert <[email protected]>
@stuggi stuggi force-pushed the backup_restore_uid branch from 875f1ca to 9173588 Compare February 4, 2026 13:17
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 4, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: stuggi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Feb 4, 2026
@fmount fmount self-requested a review February 4, 2026 13:53
@stuggi stuggi requested review from abays and dprince February 4, 2026 16:06
altName := fmt.Sprintf("%s-%s", name, instance.UID[:5])
// Generate UID suffix only if UID is available and has sufficient length
var uidSuffix string
if len(instance.UID) >= 5 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc this check will always be true because:

  1. k8s generates the uuid at server side [1
  2. as per [2] it should be wrapped into 32 chars

So this function can be simplified further because L1052 is not necessary.

@stuggi not sure I'm missing something or a corner case I'm not considering though.

[1] https://github.com/kubernetes/apimachinery/blob/master/pkg/util/uuid/uuid.go#L25
[2] https://github.com/google/uuid/blob/master/version4.go#L47

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a new CR gets created, it gets its metadata.uid during admission in the API server, just before it is persisted in etcd, not when the controller first sees it. Thesefore it's not available on ne deployment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants