Skip to content

CLVM enhancements and fixes#12617

Draft
Pearl1594 wants to merge 84 commits into
mainfrom
clvm-enhancements
Draft

CLVM enhancements and fixes#12617
Pearl1594 wants to merge 84 commits into
mainfrom
clvm-enhancements

Conversation

@Pearl1594

@Pearl1594 Pearl1594 commented Feb 9, 2026

Copy link
Copy Markdown
Contributor

Description

This PR enhances the existing CLVM implementation which was based on the deprecated CLVM technology which was based on corosync/pacemaker. With RHEL 7 having reached EOL, CLVM seems to be broken. CLVM supports RAW volumes on LVM , where as CLVM_NG support QCOW2 on LVM.

Further details: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Modernized+CLVM%3A+Enhancements+and+CLVM_NG+support

NOTE: On testing - it was identified that incremental snapshots for clvm-ng do not work as expected. As of now it's been removed from scope. So, CLVM and CLVM_NG would only support full snapshots.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

@codecov

codecov Bot commented Feb 9, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 3.68%. Comparing base (d3e1976) to head (df61d6f).
⚠️ Report is 15 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (d3e1976) and HEAD (df61d6f). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (d3e1976) HEAD (df61d6f)
unittests 1 0
Additional details and impacted files
@@              Coverage Diff              @@
##               main   #12617       +/-   ##
=============================================
- Coverage     17.90%    3.68%   -14.23%     
=============================================
  Files          5938      454     -5484     
  Lines        532864    38798   -494066     
  Branches      65192     7151    -58041     
=============================================
- Hits          95392     1428    -93964     
+ Misses       426793    37183   -389610     
+ Partials      10679      187    -10492     
Flag Coverage Δ
uitests 3.67% <ø> (+<0.01%) ⬆️
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16801

@harikrishna-patnala harikrishna-patnala added this to the 4.23.0 milestone Feb 17, 2026
UserVmVO vm = userVmDao.findById(vmId);
String cantHandleLog = String.format("Default VM snapshot cannot handle VM snapshot for [%s]", vm);

if (isRunningVMVolumeOnCLVMStorage(vm, cantHandleLog)) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Pearl1594
what's the image format on CLVM ? RAW or QCOW2 ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codecov

codecov Bot commented Feb 17, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 44.25828% with 1364 lines in your changes missing coverage. Please review.
✅ Project coverage is 18.89%. Comparing base (6bc83a3) to head (90843cb).

Files with missing lines Patch % Lines
...oud/hypervisor/kvm/storage/ClvmStorageAdaptor.java 36.26% 436 Missing and 14 partials ⚠️
...ud/hypervisor/kvm/storage/KVMStorageProcessor.java 44.00% 135 Missing and 5 partials ⚠️
...stack/engine/orchestration/VolumeOrchestrator.java 21.90% 80 Missing and 2 partials ⚠️
...n/java/com/cloud/storage/VolumeApiServiceImpl.java 30.52% 62 Missing and 4 partials ⚠️
...ervisor/kvm/resource/LibvirtComputingResource.java 55.55% 56 Missing and 8 partials ⚠️
...resource/wrapper/LibvirtMigrateCommandWrapper.java 11.76% 53 Missing and 7 partials ⚠️
...wrapper/LibvirtClvmLockTransferCommandWrapper.java 37.63% 52 Missing and 6 partials ⚠️
...torage/motion/StorageSystemDataMotionStrategy.java 20.28% 55 Missing ⚠️
...tack/storage/endpoint/DefaultEndPointSelector.java 56.63% 32 Missing and 17 partials ⚠️
...n/java/com/cloud/storage/clvm/ClvmPoolManager.java 77.10% 25 Missing and 24 partials ⚠️
... and 29 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #12617      +/-   ##
============================================
+ Coverage     18.75%   18.89%   +0.13%     
- Complexity    17966    18220     +254     
============================================
  Files          6160     6170      +10     
  Lines        552578   554920    +2342     
  Branches      67348    67736     +388     
============================================
+ Hits         103660   104844    +1184     
- Misses       437512   438555    +1043     
- Partials      11406    11521     +115     
Flag Coverage Δ
uitests 3.53% <ø> (-0.01%) ⬇️
unittests 20.09% <44.25%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16875

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16877

@sonarqubecloud

sonarqubecloud Bot commented Jun 5, 2026

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot
33.7% Coverage on New Code (required ≥ 40%)

See analysis details on SonarQube Cloud

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan test

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@RosiKyu RosiKyu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

QA Report: Modernised CLVM Support

Date: June 1, 2026
Tester: Rositsa Kyuchukova


Summary

Result Count
Pass 95
Blocked 8
Total 103

Note on Blocked tests: All blocked test cases relate to incremental snapshot functionality (CLVM_NG dirty bitmaps). This feature has been moved to a separate PR and will be handled in a dedicated incremental snapshots PR.


Test Results

Storage Pool Management

Test Case Priority Result
CLVM primary storage pool reaches Up state and shows correct type, IP and path High Pass
CLVM_NG primary storage pool reaches Up state High Pass
CLVM pool remains Up after agent restart High Pass
CLVM_NG pool remains Up after agent restart Medium Pass
CLVM_NG visible as selectable option in protocol list High Pass
VG Name field appears when CLVM or CLVM_NG is selected Medium Pass
VG Name field not shown for non-CLVM pool types Medium Pass
Pool creation rejected when VG does not exist on hosts Medium Pass
Pool creation rejected when VG name is empty Medium Pass
Duplicate CLVM/CLVM_NG pool creation with same VG name is rejected High Pass
CLVM pool capacity updates after volume creation Medium Pass
CLVM pool reported capacity matches underlying VG High Pass
CS does not overprovision CLVM/CLVM_NG volumes beyond VG capacity Medium Pass

Volume Lifecycle

Test Case Priority Result
Create CLVM data volume - LV provisioned with correct name and RAW format High Pass
CLVM volume size on LV matches the requested size High Pass
Create CLVM_NG data volume - LV provisioned and formatted as QCOW2 High Pass
CLVM_NG volume disk format reported as QCOW2 in CloudStack High Pass
CLVM_NG LV is larger than requested size to accommodate QCOW2 overhead Medium Pass
CLVM_NG LV size is rounded up to actual VG PE boundary Medium Pass
Data volume created while VM is running is provisioned on the VM's host High Pass
lvcreate completes without hanging when LV space has an existing DOS signature High Pass
Attach CLVM volume to running VM - lock transferred to VM host High Pass
Attach CLVM volume to stopped VM: lock acquired on correct host at VM start High Pass
Detach CLVM volume from running VM - lock released cleanly High Pass
Delete CLVM volume with zero-fill disabled - fast deletion with no zeroing High Pass
Delete CLVM volume with zero-fill enabled - blkdiscard issued before lvremove High Pass
Zero-fill falls back to dd when blkdiscard is not supported Medium Pass
CLVM_NG delete volume - zero-fill OFF High Pass
CLVM_NG delete volume - zero-fill ON High Pass

Volume Resize

Test Case Priority Result
CLVM volume resize while VM stopped High Pass
CLVM volume resize while VM running High Pass
Root volume resize on CLVM_NG High Pass
CLVM_NG volume resize while VM stopped High Pass
CLVM_NG volume resize while VM running High Pass
Shrink rejected on CLVM_NG volume High Pass
CLVM_NG resize - LV rounded to PE boundary High Pass
CLVM_NG resize - LV larger than requested size (overhead verified) High Pass

Zero-Fill / Secure Erase

Test Case Priority Result
Zero-fill config change has no effect until agent reconnect Medium Pass
Zero-fill enabled on one pool and disabled on another - pools behave independently High Pass

VM Operations

Test Case Priority Result
VM started on different host triggers lock transfer to new host High Pass
clvmLockHostId updated in volume_details after lock transfer on VM start High Pass
Expunge VM with zero-fill enabled - root volume zeroed and LV removed; data volumes detached only High Pass
Expunge VM with zero-fill disabled - root volume LV removed quickly with no zeroing; data volume detached only High Pass
Expunge VM with CLVM_NG volumes and zero-fill enabled - LVs zeroed and removed High Pass
Expunge VM with CLVM_NG volumes and zero-fill disabled - LVs removed and metadata cleaned High Pass

Live Migration

Test Case Priority Result
Live migrate VM from CLVM to NFS - VM accessible after migration High Pass
Live migrate VM from NFS to CLVM - VM accessible after migration High Pass
Live migrate VM from CLVM to CLVM_NG - volume converted to QCOW2 High Pass
Live migrate VM from CLVM_NG to CLVM - volume converted to RAW High Pass
Live migrate VM from CLVM_NG to NFS - QCOW2 format preserved on destination High Pass
Live migrate VM from NFS to CLVM_NG - QCOW2-backed LV created correctly High Pass
Pre-migration command transitions CLVM volume lock from exclusive to shared on source High Pass
Post-migration command transitions CLVM volume lock to exclusive on destination High Pass
clvmLockHostId updated to destination host after successful migration High Pass
Migration failure - lock reverted to exclusive on source host High Pass
VM remains accessible on source host after migration failure High Pass

Lock Management and Fan-out

Test Case Priority Result
Volume operation succeeds when clvmLockHostId points to wrong host - fan-out fallback corrects DB High Pass
Volume operation succeeds when clvmLockHostId host is down - fan-out finds actual lock holder High Pass
CLVM lightweight migration volume attach High Pass

Snapshots

Test Case Priority Result
CLVM snapshot uploaded to secondary storage after creation High Pass
Snapshot LV removed from primary storage after successful backup High Pass
Snapshot of attached CLVM volume on running VM dispatched to lock host High Pass
Snapshot of detached CLVM volume dispatched to correct host High Pass
Snapshot of unattached volume created from snapshot completes without host not found error High Pass
No orphaned LVs remain on VG after snapshot deletion High Blocked
CLVM_NG snapshot is full and initialises dirty bitmap High Pass
Second CLVM_NG snapshot is incremental and smaller than full High Pass
Incremental CLVM_NG snapshot can be extracted - converted to full snapshot for download High Pass
Disabling kvm.incremental.snapshot causes next CLVM_NG snapshot to be full Medium Pass
Disabling kvm.incremental.snapshot causes CLVM_NG to fall back to full snapshots Medium Blocked
Re-enabling kvm.incremental.snapshot - first snapshot is full then incremental resumes Medium Blocked
CLVM_NG: Dirty bitmap removed from LV metadata after live migration of VM to another host High Blocked
CLVM_NG: First snapshot after migration is full; subsequent snapshots resume as incremental High Blocked
Incremental snapshot of stopped CLVM_NG VM High Blocked
VM snapshot on CLVM volume rejected High Blocked
VM snapshot on CLVM_NG volume rejected with expected error message High Blocked
CLVM_NG snapshot artifact cleaned after backup High Pass
No orphaned artifacts after CLVM_NG snapshot deletion High Pass

Snapshot - Revert and Create From

Test Case Priority Result
Revert CLVM volume to snapshot - data reflects snapshot point in time High Pass
VM resumes cleanly after CLVM volume revert High Pass
Volume created from CLVM snapshot contains correct data High Pass
Volume created from CLVM snapshot has clvmLockHostId initialised High Pass
Volume created from CLVM snapshot has clvmLockHostId set in volume_details High Pass
Template created from CLVM snapshot is deployable High Pass
Revert CLVM_NG volume to snapshot High Pass
VM resumes cleanly after CLVM_NG root volume revert High Pass
Volume from CLVM_NG snapshot - correct data High Pass
Volume from CLVM_NG snapshot - clvmLockHostId set High Pass
Template from CLVM_NG snapshot is deployable High Pass

CLVM_NG Templates and Backing Files

Test Case Priority Result
VM deployed from CLVM_NG template uses template LV as QCOW2 backing file High Pass
Template LV on CLVM_NG pool accessible on multiple hosts simultaneously - exclusive activation blocked by sanlock High Pass
RAW template on CLVM_NG creates QCOW2 disk (regression) High Pass

API

Test Case Priority Result
createStoragePool API with CLVM url scheme creates pool in Up state High Pass
createStoragePool API with CLVM_NG url scheme creates pool in Up state High Pass
listStoragePools API returns correct type field for CLVM and CLVM_NG pools Medium Pass
createVolume API on CLVM pool sets clvmLockHostId in volume_details Medium Pass
attachVolume and detachVolume API on CLVM volume triggers lock transfer High Pass
createSnapshot API on CLVM volume dispatches to lock host High Pass
Management server log confirms lightweight migration path taken - no data copy High Pass

PR Quality Checks

Test Case Priority Result
Code coverage criteria Medium Pass
CloudStack Style Medium Pass
Documentation Medium Pass
PR Mergable Medium Pass

@blueorangutan

Copy link
Copy Markdown

[SF] Trillian test result (tid-16254)
Environment: kvm-ol8 (x2), zone: Advanced Networking with Mgmt server ol8
Total time taken: 49433 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr12617-t16254-kvm-ol8.zip
Smoke tests completed. 151 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 18195

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan test

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

Comment on lines +649 to +671
// For CLVM volumes, route to the host holding the exclusive lock
if (volume.getHypervisorType() == Hypervisor.HypervisorType.KVM) {
DataStore store = volume.getDataStore();
if (store.getRole() == DataStoreRole.Primary) {
StoragePoolVO pool = _storagePoolDao.findById(store.getId());
if (pool != null && ClvmPoolManager.isClvmPoolType(pool.getPoolType())) {
Long lockHostId = getClvmLockHostId(volume);
if (lockHostId != null) {
logger.info("Routing CLVM volume {} deletion to lock holder host {}",
volume.getUuid(), lockHostId);
EndPoint ep = getEndPointFromHostId(lockHostId);
if (ep != null) {
return ep;
}
logger.warn("Could not get endpoint for CLVM lock host {}, falling back to default selection",
lockHostId);
} else {
logger.debug("No CLVM lock host tracked for volume {}, using default endpoint selection",
volume.getUuid());
}
}
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar logic is executed many times throughout this file. We could use a single method instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 18198

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan test

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@sonarqubecloud

sonarqubecloud Bot commented Jun 8, 2026

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot
34.0% Coverage on New Code (required ≥ 40%)

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants