Skip to content

CLVM enhancements and fixes#12617

Draft
Pearl1594 wants to merge 88 commits into
mainfrom
clvm-enhancements
Draft

CLVM enhancements and fixes#12617
Pearl1594 wants to merge 88 commits into
mainfrom
clvm-enhancements

Conversation

@Pearl1594

@Pearl1594 Pearl1594 commented Feb 9, 2026

Copy link
Copy Markdown
Contributor

Description

This PR enhances the existing CLVM implementation which was based on the deprecated CLVM technology which was based on corosync/pacemaker. With RHEL 7 having reached EOL, CLVM seems to be broken. CLVM supports RAW volumes on LVM , where as CLVM_NG support QCOW2 on LVM.

Further details: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Modernized+CLVM%3A+Enhancements+and+CLVM_NG+support

NOTE: On testing - it was identified that incremental snapshots for clvm-ng do not work as expected. As of now it's been removed from scope. So, CLVM and CLVM_NG would only support full snapshots.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

@codecov

codecov Bot commented Feb 9, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 3.68%. Comparing base (d3e1976) to head (df61d6f).
⚠️ Report is 15 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (d3e1976) and HEAD (df61d6f). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (d3e1976) HEAD (df61d6f)
unittests 1 0
Additional details and impacted files
@@              Coverage Diff              @@
##               main   #12617       +/-   ##
=============================================
- Coverage     17.90%    3.68%   -14.23%     
=============================================
  Files          5938      454     -5484     
  Lines        532864    38798   -494066     
  Branches      65192     7151    -58041     
=============================================
- Hits          95392     1428    -93964     
+ Misses       426793    37183   -389610     
+ Partials      10679      187    -10492     
Flag Coverage Δ
uitests 3.67% <ø> (+<0.01%) ⬆️
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16801

@harikrishna-patnala harikrishna-patnala added this to the 4.23.0 milestone Feb 17, 2026
@codecov

codecov Bot commented Feb 17, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 43.57632% with 1401 lines in your changes missing coverage. Please review.
✅ Project coverage is 18.88%. Comparing base (6bc83a3) to head (3d2a3ca).

Files with missing lines Patch % Lines
...oud/hypervisor/kvm/storage/ClvmStorageAdaptor.java 36.26% 436 Missing and 14 partials ⚠️
...ud/hypervisor/kvm/storage/KVMStorageProcessor.java 44.00% 135 Missing and 5 partials ⚠️
...torage/motion/StorageSystemDataMotionStrategy.java 13.86% 87 Missing ⚠️
...stack/engine/orchestration/VolumeOrchestrator.java 21.69% 81 Missing and 2 partials ⚠️
...n/java/com/cloud/storage/VolumeApiServiceImpl.java 28.57% 61 Missing and 4 partials ⚠️
...ervisor/kvm/resource/LibvirtComputingResource.java 55.55% 56 Missing and 8 partials ⚠️
...resource/wrapper/LibvirtMigrateCommandWrapper.java 11.76% 53 Missing and 7 partials ⚠️
...wrapper/LibvirtClvmLockTransferCommandWrapper.java 37.63% 52 Missing and 6 partials ⚠️
...n/java/com/cloud/vm/VirtualMachineManagerImpl.java 16.39% 51 Missing ⚠️
...tack/storage/endpoint/DefaultEndPointSelector.java 56.63% 32 Missing and 17 partials ⚠️
... and 29 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #12617      +/-   ##
============================================
+ Coverage     18.75%   18.88%   +0.13%     
- Complexity    17966    18214     +248     
============================================
  Files          6160     6170      +10     
  Lines        552578   554954    +2376     
  Branches      67348    67742     +394     
============================================
+ Hits         103660   104830    +1170     
- Misses       437512   438604    +1092     
- Partials      11406    11520     +114     
Flag Coverage Δ
uitests 3.53% <ø> (-0.01%) ⬇️
unittests 20.08% <43.57%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16875

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16877

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 18195

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan test

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 18198

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan test

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@JoaoJandre JoaoJandre left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was only able to review half the files. I will do the rest later.
Also, could you point to a documentation somewhere on how to setup a CLVM_NG storage to add it to ACS? I was not able to find it online

Comment thread engine/orchestration/src/main/java/com/cloud/vm/VirtualMachineManagerImpl.java Outdated
Comment thread engine/orchestration/src/main/java/com/cloud/vm/VirtualMachineManagerImpl.java Outdated
Comment thread engine/orchestration/src/main/java/com/cloud/vm/VirtualMachineManagerImpl.java Outdated
Comment thread engine/orchestration/src/main/java/com/cloud/vm/VirtualMachineManagerImpl.java Outdated
Comment on lines +2268 to +2269
String details = preMigAnswer != null ? preMigAnswer.getDetails() : "null answer returned";
logger.warn("PreMigrationCommand failed for CLVM/CLVM_NG VM [{}] on source host [{}]: {}. Migration will continue but may fail if volumes are exclusively locked.", vmTO.getName(), srcHost.getId(), details);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we abort the migration in this case?
If we continue and it does not work, it could lead to problems in the VM, no?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - if it fails, it will behave like any VM migration failure. The source VM / volume goes back to original state.

@Pearl1594

Copy link
Copy Markdown
Contributor Author

I was only able to review half the files. I will do the rest later. Also, could you point to a documentation somewhere on how to setup a CLVM_NG storage to add it to ACS? I was not able to find it online

It can be found at: apache/cloudstack-documentation#637

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

Comment thread server/src/main/java/com/cloud/storage/VolumeApiServiceImpl.java Outdated
Comment thread server/src/main/java/com/cloud/storage/VolumeApiServiceImpl.java Outdated
Comment thread server/src/main/java/com/cloud/storage/VolumeApiServiceImpl.java Outdated
Comment on lines +2749 to 2785
// Check if CLVM lock transfer is needed (even if moveVolumeNeeded is false)
// This handles the case where the volume is already on the correct storage pool
// but the VM is running on a different host, requiring only a lock transfer
boolean isClvmLockTransferNeeded = !moveVolumeNeeded &&
isClvmLockTransferRequired(newVolumeOnPrimaryStorage, existingVolumeOfVm, vm);

if (isClvmLockTransferNeeded) {
// CLVM lock transfer - no data copy, no pool change needed
newVolumeOnPrimaryStorage = executeLightweightLockMigration(
newVolumeOnPrimaryStorage, vm, existingVolumeOfVm,
"CLVM lock transfer", "same pool to different host");
} else if (moveVolumeNeeded) {
PrimaryDataStoreInfo primaryStore = (PrimaryDataStoreInfo)newVolumeOnPrimaryStorage.getDataStore();
if (primaryStore.isLocal()) {
throw new CloudRuntimeException(
"Failed to attach local data volume " + volumeToAttach.getName() + " to VM " + vm.getDisplayName() + " as migration of local data volume is not allowed");
}
StoragePoolVO vmRootVolumePool = _storagePoolDao.findById(existingVolumeOfVm.getPoolId());

try {
HypervisorType volumeToAttachHyperType = _volsDao.getHypervisorType(volumeToAttach.getId());
newVolumeOnPrimaryStorage = _volumeMgr.moveVolume(newVolumeOnPrimaryStorage, vmRootVolumePool.getDataCenterId(), vmRootVolumePool.getPodId(), vmRootVolumePool.getClusterId(),
volumeToAttachHyperType);
} catch (ConcurrentOperationException | StorageUnavailableException e) {
logger.debug("move volume failed", e);
throw new CloudRuntimeException("move volume failed", e);
boolean isClvmLightweightMigration = isClvmLightweightMigrationNeeded(
newVolumeOnPrimaryStorage, existingVolumeOfVm, vm);

if (isClvmLightweightMigration) {
newVolumeOnPrimaryStorage = executeLightweightLockMigration(
newVolumeOnPrimaryStorage, vm, existingVolumeOfVm,
"CLVM lightweight migration", "different pools, same VG");
} else {
StoragePoolVO vmRootVolumePool = _storagePoolDao.findById(existingVolumeOfVm.getPoolId());

try {
HypervisorType volumeToAttachHyperType = _volsDao.getHypervisorType(volumeToAttach.getId());
newVolumeOnPrimaryStorage = _volumeMgr.moveVolume(newVolumeOnPrimaryStorage, vmRootVolumePool.getDataCenterId(), vmRootVolumePool.getPodId(), vmRootVolumePool.getClusterId(),
volumeToAttachHyperType);
} catch (ConcurrentOperationException | StorageUnavailableException e) {
logger.debug("move volume failed", e);
throw new CloudRuntimeException("move volume failed", e);
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this volume move section be extracted to a separate method to reduce indentation?

@Pearl1594 Pearl1594 Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be ok if I addressed this in another PR - with other enhancements that are yet to come.

Comment thread server/src/main/java/com/cloud/storage/VolumeApiServiceImpl.java Outdated
@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 18208

@blueorangutan

Copy link
Copy Markdown

[SF] Trillian test result (tid-16265)
Environment: kvm-ol8 (x2), zone: Advanced Networking with Mgmt server ol8
Total time taken: 94885 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr12617-t16265-kvm-ol8.zip
Smoke tests completed. 136 look OK, 15 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_CRUD_operations_userdata Error 1522.99 test_register_userdata.py
test_deploy_vm_with_registered_userdata Error 5.58 test_register_userdata.py
test_deploy_vm_with_registered_userdata_with_override_policy_allow Error 5.55 test_register_userdata.py
test_deploy_vm_with_registered_userdata_with_override_policy_append Error 6.48 test_register_userdata.py
test_deploy_vm_with_registered_userdata_with_override_policy_deny Error 5.56 test_register_userdata.py
test_deploy_vm_with_registered_userdata_with_params Error 6.65 test_register_userdata.py
test_link_and_unlink_userdata_to_template Error 5.73 test_register_userdata.py
test_user_userdata_crud Error 9.72 test_register_userdata.py
ContextSuite context=TestResetVmOnReboot>:setup Error 0.00 test_reset_vm_on_reboot.py
ContextSuite context=TestRAMCPUResourceAccounting>:setup Error 0.00 test_resource_accounting.py
test_03_register_template Error 1.13 test_resource_names.py
ContextSuite context=TestRestoreVM>:setup Error 0.00 test_restore_vm.py
ContextSuite context=TestRouterDHCPHosts>:setup Error 0.00 test_router_dhcphosts.py
ContextSuite context=TestRouterDHCPOpts>:setup Error 0.00 test_router_dhcphosts.py
ContextSuite context=TestRouterDns>:setup Error 0.00 test_router_dns.py
ContextSuite context=TestRouterDnsService>:setup Error 0.00 test_router_dnsservice.py
ContextSuite context=TestRouterIpTablesPolicies>:setup Error 0.00 test_routers_iptables_default_policy.py
ContextSuite context=TestVPCIpTablesPolicies>:setup Error 0.00 test_routers_iptables_default_policy.py
ContextSuite context=TestIsolatedNetworks>:setup Error 0.00 test_routers_network_ops.py
ContextSuite context=TestRedundantIsolateNetworks>:setup Error 0.00 test_routers_network_ops.py
ContextSuite context=TestRouterServices>:setup Error 0.00 test_routers.py
ContextSuite context=TestCpuCapServiceOfferings>:setup Error 0.00 test_service_offerings.py
ContextSuite context=TestServiceOfferings>:setup Error 0.30 test_service_offerings.py
ContextSuite context=TestSetSourceNatIp>:setup Error 0.00 test_set_sourcenat.py
ContextSuite context=TestSnapshotRootDisk>:setup Error 0.00 test_snapshots.py
ContextSuite context=TestSnapshotStandaloneBackup>:setup Error 0.00 test_snapshots.py
ContextSuite context=TestSslOffloading>:setup Error 0.00 test_ssl_offloading.py

…ures, change the clvm vols to exclusive on source from shared, and on success, change dest vol to exclusive only for cross-pool migration
@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot
33.5% Coverage on New Code (required ≥ 40%)

See analysis details on SonarQube Cloud

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 18209

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan test

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

Comment on lines +2445 to +2449
Answer answer = agentManager.send(hostId,
new ClvmLockTransferCommand(operation, lvPath, volumeInfo.getUuid()));
if (answer == null || !answer.getResult()) {
String details = answer != null ? answer.getDetails() : "null answer";
logger.warn("CLVM lock command [{}] failed for LV [{}] on host [{}]: {}", operation, lvPath, hostId, details);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the consequence of improper locking? If we get a false answer here we just continue, what happens then?
If there is no consequence, why are we locking and unlocking?

public class LibvirtClvmLockTransferCommandWrapper
extends CommandWrapper<ClvmLockTransferCommand, Answer, LibvirtComputingResource> {

protected Logger logger = LogManager.getLogger(getClass());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CommandWrapper class already has a logger, which is inherited by this class. There is no need to define it again.

Comment on lines +133 to +137
(trimmed.charAt(0) == '-' || trimmed.charAt(0) == 'w' ||
trimmed.charAt(0) == 'r' || trimmed.charAt(0) == 's' ||
trimmed.charAt(0) == 'v' || trimmed.charAt(0) == 'm' ||
trimmed.charAt(0) == 'p' || trimmed.charAt(0) == 'c' ||
trimmed.charAt(0) == 'o')) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could put these in a list and check if the char at 0 is in the list. Or even better, create a Enum so we know what each character means.

Comment on lines +1099 to +1102
if (isClvmBlockDevice(diskInfo)) {
logger.debug("Found disk targeting CLVM/CLVM_NG destination pool");
return true;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (isClvmBlockDevice(diskInfo)) {
logger.debug("Found disk targeting CLVM/CLVM_NG destination pool");
return true;
}
if (isClvmBlockDevice(diskInfo)) {
logger.debug("Found disk targeting CLVM/CLVM_NG destination pool");
return true;
}

}
}

public static void modifyClvmVolumesStateForMigration(List<DiskDef> disks, LibvirtComputingResource resource,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this method static?

}
}


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove these newlines

Comment on lines +1344 to +1345


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

}
}


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

if (isKvmAndFileBasedStorage && backupSnapToSecondary) {
StoragePoolType poolType = volume.getStoragePoolType();

if ((isKvmAndFileBasedStorage) && backupSnapToSecondary) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ((isKvmAndFileBasedStorage) && backupSnapToSecondary) {
if (isKvmAndFileBasedStorage && backupSnapToSecondary) {

payload.setLocationType(null);
}

// CLVM_NG excluded: incremental snapshots not supported in this release (handled in takeSnapshot).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants