Skip to content

fix(vm): handle FailedMapVolume in pod volume error detection#2433

Open
eofff wants to merge 3 commits into
mainfrom
fix/vm/provide-more-pod-errors
Open

fix(vm): handle FailedMapVolume in pod volume error detection#2433
eofff wants to merge 3 commits into
mainfrom
fix/vm/provide-more-pod-errors

Conversation

@eofff

@eofff eofff commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Description

Added handling for one more pod volume warning reason (FailedMapVolume) and unified volume-error reason checks in the VM controller.

Changes:

  • added ReasonFailedMapVolume constant in volumeevent_watcher.go;
  • introduced IsVolumeErrorReason(reason string) helper to keep reason filtering in one place;
  • replaced duplicated checks in watcher predicates/handlers and in VM lifecycle error extraction with the new helper.

Why do we need it, and what problem does it solve?

Previously, VM pod volume errors were recognized only for FailedAttachVolume and FailedMount.
In some failure scenarios Kubernetes emits FailedMapVolume, so the controller could miss relevant warning events and return less informative VM error status.
After this change, all key volume-related warning reasons are processed consistently, so users get clearer and more complete diagnostics when VM startup fails due to volume issues.

What is the expected result?

  1. Create/reproduce a VM startup issue where pod events include FailedMapVolume (or existing FailedAttachVolume / FailedMount).
  2. Verify that the event is picked up by volume event watcher.
  3. Verify that VM status/lifecycle returns a VMPodVolumeError with the correct Reason and Message.

Checklist

  • The code is covered by unit tests.
  • e2e tests passed.
  • Documentation updated according to the changes.
  • Changes were tested in the Kubernetes cluster manually.

Changelog entries

section: vm
type: fix
summary: "VM pod volume error handling now includes FailedMapVolume and surfaces more complete pod volume diagnostics."

@eofff eofff added this to the v1.9.0 milestone Jun 3, 2026
@eofff eofff changed the title provide more pod errors fix(vm): handle FailedMapVolume in pod volume error detection Jun 3, 2026
@universal-itengineer universal-itengineer modified the milestones: v1.9.0, v1.10.0 Jun 10, 2026
Valeriy Khorunzhin added 3 commits June 15, 2026 16:58
Signed-off-by: Valeriy Khorunzhin <valeriy.khorunzhin@flant.com>
t
Signed-off-by: Valeriy Khorunzhin <valeriy.khorunzhin@flant.com>
Signed-off-by: Valeriy Khorunzhin <valeriy.khorunzhin@flant.com>
@eofff eofff force-pushed the fix/vm/provide-more-pod-errors branch from 8c3dd4c to a2d9f11 Compare June 15, 2026 13:58
@eofff eofff marked this pull request as ready for review June 15, 2026 14:08
@eofff eofff requested a review from LopatinDmitr June 15, 2026 14:09
Comment on lines +61 to +73
func getVirtualMachineNameFromPodLabels(pod *corev1.Pod) (string, bool) {
if pod == nil {
return "", false
}

if vmName, hasLabel := pod.GetLabels()[virtv1.VirtualMachineNameLabel]; hasLabel {
return vmName, true
}

return "", false
}

func (w *VolumeEventWatcher) resolveVMNameByHotplugStatus(ctx context.Context, pod *corev1.Pod) (string, bool) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be create one function?

client client.Client
}

func getVirtualMachineNameFromPodLabels(pod *corev1.Pod) (string, bool) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why return bool?

"github.com/deckhouse/virtualization/api/core/v1alpha2"
)

var _ = Describe("LifeCycleHandler hotplug pod errors", func() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would I add minimally?

If you don't inflate the tests, I would add 3 cases.:

1. UID mismatch

  should ignore hotplug pod when UID does not match KVVMI status                                                                                              

2. Not ContainerCreating

  should ignore volume event when pod is not ContainerCreating                                                                                                

3. Non-volume event

  should ignore non-volume warning events                                                                                                                     

These three tests greatly increase confidence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants