CLVM enhancements and fixes#12617
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests.
Additional details and impacted files@@ Coverage Diff @@
## main #12617 +/- ##
=============================================
- Coverage 17.90% 3.68% -14.23%
=============================================
Files 5938 454 -5484
Lines 532864 38798 -494066
Branches 65192 7151 -58041
=============================================
- Hits 95392 1428 -93964
+ Misses 426793 37183 -389610
+ Partials 10679 187 -10492
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
5fc1f12 to
9e03f4b
Compare
|
@blueorangutan package |
|
@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16801 |
a08e7a5 to
df61d6f
Compare
df61d6f to
43e9384
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #12617 +/- ##
============================================
+ Coverage 18.75% 18.88% +0.13%
- Complexity 17966 18214 +248
============================================
Files 6160 6170 +10
Lines 552578 554954 +2376
Branches 67348 67742 +394
============================================
+ Hits 103660 104830 +1170
- Misses 437512 438604 +1092
- Partials 11406 11520 +114
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
3f900e8 to
c9dd7ed
Compare
|
@blueorangutan package |
|
@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16875 |
|
@blueorangutan package |
|
@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16877 |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 18195 |
|
@blueorangutan test |
|
@Pearl1594 a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
|
@blueorangutan package |
|
@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 18198 |
|
@blueorangutan test |
|
@Pearl1594 a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
JoaoJandre
left a comment
There was a problem hiding this comment.
I was only able to review half the files. I will do the rest later.
Also, could you point to a documentation somewhere on how to setup a CLVM_NG storage to add it to ACS? I was not able to find it online
| String details = preMigAnswer != null ? preMigAnswer.getDetails() : "null answer returned"; | ||
| logger.warn("PreMigrationCommand failed for CLVM/CLVM_NG VM [{}] on source host [{}]: {}. Migration will continue but may fail if volumes are exclusively locked.", vmTO.getName(), srcHost.getId(), details); |
There was a problem hiding this comment.
Shouldn't we abort the migration in this case?
If we continue and it does not work, it could lead to problems in the VM, no?
There was a problem hiding this comment.
No - if it fails, it will behave like any VM migration failure. The source VM / volume goes back to original state.
It can be found at: apache/cloudstack-documentation#637 |
|
@blueorangutan package |
|
@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress. |
| // Check if CLVM lock transfer is needed (even if moveVolumeNeeded is false) | ||
| // This handles the case where the volume is already on the correct storage pool | ||
| // but the VM is running on a different host, requiring only a lock transfer | ||
| boolean isClvmLockTransferNeeded = !moveVolumeNeeded && | ||
| isClvmLockTransferRequired(newVolumeOnPrimaryStorage, existingVolumeOfVm, vm); | ||
|
|
||
| if (isClvmLockTransferNeeded) { | ||
| // CLVM lock transfer - no data copy, no pool change needed | ||
| newVolumeOnPrimaryStorage = executeLightweightLockMigration( | ||
| newVolumeOnPrimaryStorage, vm, existingVolumeOfVm, | ||
| "CLVM lock transfer", "same pool to different host"); | ||
| } else if (moveVolumeNeeded) { | ||
| PrimaryDataStoreInfo primaryStore = (PrimaryDataStoreInfo)newVolumeOnPrimaryStorage.getDataStore(); | ||
| if (primaryStore.isLocal()) { | ||
| throw new CloudRuntimeException( | ||
| "Failed to attach local data volume " + volumeToAttach.getName() + " to VM " + vm.getDisplayName() + " as migration of local data volume is not allowed"); | ||
| } | ||
| StoragePoolVO vmRootVolumePool = _storagePoolDao.findById(existingVolumeOfVm.getPoolId()); | ||
|
|
||
| try { | ||
| HypervisorType volumeToAttachHyperType = _volsDao.getHypervisorType(volumeToAttach.getId()); | ||
| newVolumeOnPrimaryStorage = _volumeMgr.moveVolume(newVolumeOnPrimaryStorage, vmRootVolumePool.getDataCenterId(), vmRootVolumePool.getPodId(), vmRootVolumePool.getClusterId(), | ||
| volumeToAttachHyperType); | ||
| } catch (ConcurrentOperationException | StorageUnavailableException e) { | ||
| logger.debug("move volume failed", e); | ||
| throw new CloudRuntimeException("move volume failed", e); | ||
| boolean isClvmLightweightMigration = isClvmLightweightMigrationNeeded( | ||
| newVolumeOnPrimaryStorage, existingVolumeOfVm, vm); | ||
|
|
||
| if (isClvmLightweightMigration) { | ||
| newVolumeOnPrimaryStorage = executeLightweightLockMigration( | ||
| newVolumeOnPrimaryStorage, vm, existingVolumeOfVm, | ||
| "CLVM lightweight migration", "different pools, same VG"); | ||
| } else { | ||
| StoragePoolVO vmRootVolumePool = _storagePoolDao.findById(existingVolumeOfVm.getPoolId()); | ||
|
|
||
| try { | ||
| HypervisorType volumeToAttachHyperType = _volsDao.getHypervisorType(volumeToAttach.getId()); | ||
| newVolumeOnPrimaryStorage = _volumeMgr.moveVolume(newVolumeOnPrimaryStorage, vmRootVolumePool.getDataCenterId(), vmRootVolumePool.getPodId(), vmRootVolumePool.getClusterId(), | ||
| volumeToAttachHyperType); | ||
| } catch (ConcurrentOperationException | StorageUnavailableException e) { | ||
| logger.debug("move volume failed", e); | ||
| throw new CloudRuntimeException("move volume failed", e); | ||
| } | ||
| } |
There was a problem hiding this comment.
Can this volume move section be extracted to a separate method to reduce indentation?
There was a problem hiding this comment.
Would it be ok if I addressed this in another PR - with other enhancements that are yet to come.
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 18208 |
|
[SF] Trillian test result (tid-16265)
|
…ures, change the clvm vols to exclusive on source from shared, and on success, change dest vol to exclusive only for cross-pool migration
|
@blueorangutan package |
|
@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress. |
|
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 18209 |
|
@blueorangutan test |
|
@Pearl1594 a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
| Answer answer = agentManager.send(hostId, | ||
| new ClvmLockTransferCommand(operation, lvPath, volumeInfo.getUuid())); | ||
| if (answer == null || !answer.getResult()) { | ||
| String details = answer != null ? answer.getDetails() : "null answer"; | ||
| logger.warn("CLVM lock command [{}] failed for LV [{}] on host [{}]: {}", operation, lvPath, hostId, details); |
There was a problem hiding this comment.
What is the consequence of improper locking? If we get a false answer here we just continue, what happens then?
If there is no consequence, why are we locking and unlocking?
| public class LibvirtClvmLockTransferCommandWrapper | ||
| extends CommandWrapper<ClvmLockTransferCommand, Answer, LibvirtComputingResource> { | ||
|
|
||
| protected Logger logger = LogManager.getLogger(getClass()); |
There was a problem hiding this comment.
The CommandWrapper class already has a logger, which is inherited by this class. There is no need to define it again.
| (trimmed.charAt(0) == '-' || trimmed.charAt(0) == 'w' || | ||
| trimmed.charAt(0) == 'r' || trimmed.charAt(0) == 's' || | ||
| trimmed.charAt(0) == 'v' || trimmed.charAt(0) == 'm' || | ||
| trimmed.charAt(0) == 'p' || trimmed.charAt(0) == 'c' || | ||
| trimmed.charAt(0) == 'o')) { |
There was a problem hiding this comment.
Could put these in a list and check if the char at 0 is in the list. Or even better, create a Enum so we know what each character means.
| if (isClvmBlockDevice(diskInfo)) { | ||
| logger.debug("Found disk targeting CLVM/CLVM_NG destination pool"); | ||
| return true; | ||
| } |
There was a problem hiding this comment.
| if (isClvmBlockDevice(diskInfo)) { | |
| logger.debug("Found disk targeting CLVM/CLVM_NG destination pool"); | |
| return true; | |
| } | |
| if (isClvmBlockDevice(diskInfo)) { | |
| logger.debug("Found disk targeting CLVM/CLVM_NG destination pool"); | |
| return true; | |
| } |
| } | ||
| } | ||
|
|
||
| public static void modifyClvmVolumesStateForMigration(List<DiskDef> disks, LibvirtComputingResource resource, |
There was a problem hiding this comment.
Why is this method static?
| } | ||
| } | ||
|
|
||
|
|
There was a problem hiding this comment.
Please remove these newlines
|
|
||
|
|
| } | ||
| } | ||
|
|
||
|
|
| if (isKvmAndFileBasedStorage && backupSnapToSecondary) { | ||
| StoragePoolType poolType = volume.getStoragePoolType(); | ||
|
|
||
| if ((isKvmAndFileBasedStorage) && backupSnapToSecondary) { |
There was a problem hiding this comment.
| if ((isKvmAndFileBasedStorage) && backupSnapToSecondary) { | |
| if (isKvmAndFileBasedStorage && backupSnapToSecondary) { |
| payload.setLocationType(null); | ||
| } | ||
|
|
||
| // CLVM_NG excluded: incremental snapshots not supported in this release (handled in takeSnapshot). |
There was a problem hiding this comment.
Is this comment needed?


Description
This PR enhances the existing CLVM implementation which was based on the deprecated CLVM technology which was based on corosync/pacemaker. With RHEL 7 having reached EOL, CLVM seems to be broken. CLVM supports RAW volumes on LVM , where as CLVM_NG support QCOW2 on LVM.
Further details: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Modernized+CLVM%3A+Enhancements+and+CLVM_NG+support
NOTE: On testing - it was identified that incremental snapshots for clvm-ng do not work as expected. As of now it's been removed from scope. So, CLVM and CLVM_NG would only support full snapshots.
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
How did you try to break this feature and the system with this change?