Skip to content

[BUG] Inflated ReadyReplicas During Rolling Updates in k8s #4498

@olamide226

Description

@olamide226

Description

MCPServer status reports an inflated number of ReadyReplicas during rolling updates.

This occurs because the reconciliation logic counts all pods matching the toolhive-name label, including pods that are terminating but still marked as ready.

Environment

  • Operator Version: v0.14.1
  • CRD Version: v0.14.1

Current Behavior

During rollout:

  • Old pods enter Terminating state but may still be counted as Ready
  • New pods start concurrently
  • ReadyReplicas temporarily exceeds desired replica count

Example:

  • Desired replicas: 2
  • Reported ReadyReplicas: 3 or 4 during rollout

Expected Behavior

  • ReadyReplicas should reflect only active, non-terminating, ready pods
  • Should not exceed desired replica count under normal conditions

Root Cause

  • Status logic includes pods with:
    • Matching label
    • Non-nil DeletionTimestamp (terminating pods)

Proposed Solutions

1. Filter Terminating Pods

Update updateMCPServerStatus:

  • Exclude pods where DeletionTimestamp is set

2. Improve Label Selectors

Introduce clearer labels:

  • toolhive-tool-type: proxy
  • toolhive-tool-type: mcp-server

This allows:

  • More accurate filtering
  • Potential separation of proxy vs backend status

3. Optional Enhancement

Consider reporting:

  • Proxy replicas
  • Backend replicas
    separately for better observability

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingkubernetesItems related to Kubernetesoperator

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions