Skip to content

Conversation

@loverustfs
Copy link
Collaborator

@loverustfs loverustfs commented Dec 20, 2025

Link

rustfs/rustfs#1193


Summary

This PR adds a configurable “Pod Deletion Policy When Node Is Down” option to the RustFS operator, inspired by Longhorn’s setting. It automates cleanup of terminating Pods stuck on an unreachable node, enabling controllers (especially StatefulSets) to recreate replacement Pods without manual kubectl delete --force.


Motivation / Problem

When a node becomes NotReady/Unknown (or disappears), Pods may get stuck in Terminating and never complete graceful shutdown. In StatefulSet-based workloads, this can prevent timely recovery because the replacement Pod may not be created until the old terminating Pod object is removed.

Operators/admins currently must manually force delete the stuck Pod to unblock recovery.


What this PR does

New Tenant configuration

Adds a new Tenant spec field:

  • spec.podDeletionPolicyWhenNodeIsDown

Supported values:

  • DoNothing (default)
  • Delete (normal delete)
  • ForceDelete (delete with gracePeriodSeconds=0)
  • Longhorn-like variants (force delete terminating pods on down nodes, filtered by controller type):
    • DeleteStatefulSetPod
    • DeleteDeploymentPod (Deployment pods are typically owned by ReplicaSet)
    • DeleteBothStatefulSetAndDeploymentPod

Reconcile behavior

During reconciliation, if the policy is enabled (not DoNothing), the operator:

  1. Lists tenant-owned Pods via label selector rustfs.tenant=<tenant-name>
  2. Only targets Pods already in Terminating (metadata.deletionTimestamp != None)
  3. Resolves the hosting Node (pod.spec.nodeName) and treats it as “down” when:
    • Node Ready != True (False / Unknown), or
    • Node object is not found (404)
  4. Applies the configured deletion policy (normal delete or force delete)

RBAC changes

The operator ClusterRole is extended with read access to Nodes:

  • nodes: get, list, watch

This is required to check Node readiness.


Example YAML

apiVersion: rustfs.com/v1alpha1
kind: Tenant
metadata:
  name: my-tenant
spec:
  podDeletionPolicyWhenNodeIsDown: DeleteStatefulSetPod
  pools:
    - name: pool-0
      servers: 4
      # ...

Safety / Caveats

  • Force deleting Pods can have data consistency implications depending on the storage backend and workload semantics.
  • Default remains DoNothing.
  • The operator intentionally acts only on Pods already in Terminating to keep behavior conservative and avoid deleting healthy running Pods.

Tests

Unit tests added/updated to validate:

  • Node down detection based on Node Ready condition (True vs False/Unknown)
  • Controller-kind filtering for Longhorn-like variants (StatefulSet vs ReplicaSet ownership)

All tests pass:

  • cargo test

Implementation notes (files changed)

  • reconcile.rs
  • k8s.rs
  • tenant.rs
  • tenant.yaml
  • clusterrole.yaml

Checklist

  • Adds CRD field + schema update
  • Adds reconcile logic and events
  • Updates RBAC for Node reads
  • Adds unit tests
  • cargo test passes

- Reformat imports to follow standard order
- Make cleanup_stuck_terminating_pods_on_down_nodes call single-line
- Reformat assert! statements in tests to multi-line for better readability
@loverustfs loverustfs enabled auto-merge December 20, 2025 03:46
@loverustfs
Copy link
Collaborator Author

Hey @shahab96 ,

Thanks a lot.

I submitted this bug(https://github.com/rustfs/rustfs/issues/1193) report to Gemini, and here are his suggested fixes.
I'm unsure if they're feasible. If they aren't, you can close this issue.

@shahab96
Copy link
Collaborator

Hey @shahab96 ,

Thanks a lot.

I submitted this bug(https://github.com/rustfs/rustfs/issues/1193) report to Gemini, and here are his suggested fixes.
I'm unsure if they're feasible. If they aren't, you can close this issue.

Totally feasible and useful changes here! Just some minor comments I had.

The one for the if statement can be totally ignored honestly. I think all we'd need here is to emit a log when the node is detected to be down. That's all actually.

Thank you for helping us out with the operator!

@loverustfs
Copy link
Collaborator Author

Hey @shahab96 ,
Thanks a lot.
I submitted this bug(https://github.com/rustfs/rustfs/issues/1193) report to Gemini, and here are his suggested fixes.
I'm unsure if they're feasible. If they aren't, you can close this issue.

Totally feasible and useful changes here! Just some minor comments I had.

The one for the if statement can be totally ignored honestly. I think all we'd need here is to emit a log when the node is detected to be down. That's all actually.

Thank you for helping us out with the operator!

No problem.

Thank you for helping to make the RustFS operator better!

@loverustfs
Copy link
Collaborator Author

Hey @shahab96 ,

Update submitted.

- use tracing::{debug, error, warn};
+ use tracing::{debug, error, warn};

+        let pod_name = pod.name_any();
+        warn!(
+            "Node {} is detected down. Pod {} is terminating on it.",
+            node_name, pod_name
+        );

@bestgopher bestgopher disabled auto-merge December 20, 2025 13:17
@bestgopher bestgopher enabled auto-merge December 20, 2025 13:17
@bestgopher bestgopher disabled auto-merge December 20, 2025 13:18
@bestgopher bestgopher enabled auto-merge December 20, 2025 13:18
@bestgopher bestgopher added this pull request to the merge queue Dec 20, 2025
Merged via the queue into main with commit b00c9e1 Dec 20, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants