loop: add regression test for partscan double-scan race by daandemeyer · Pull Request #240 · linux-blktests/blktests

daandemeyer · 2026-03-31T10:45:02Z

Add a stress test that detects spurious partition removal events when setting up a loop device with partscan enabled.

The kernel bug was that disk_force_media_change() set GD_NEED_PART_SCAN, causing udev's device open to trigger a partition scan racing with the explicit scan from loop_reread_partitions(). The second scan would drop and re-add all partitions, making partition devices briefly disappear.

The test monitors kernel uevents while repeatedly setting up and tearing down a loop device with partscan. Each cycle should produce exactly one add and one remove uevent for the partition device. Extra events indicate the double-scan race was triggered.

Link: https://lore.kernel.org/linux-block/20260330081819.652890-1-daan@amutable.com/T/#u

Add a stress test that detects spurious partition removal events when setting up a loop device with partscan enabled. The kernel bug was that disk_force_media_change() set GD_NEED_PART_SCAN, causing udev's device open to trigger a partition scan racing with the explicit scan from loop_reread_partitions(). The second scan would drop and re-add all partitions, making partition devices briefly disappear. The test monitors kernel uevents while repeatedly setting up and tearing down a loop device with partscan. Each cycle should produce exactly one add and one remove uevent for the partition device. Extra events indicate the double-scan race was triggered. Link: https://lore.kernel.org/linux-block/20260330081819.652890-1-daan@amutable.com/T/#u Signed-off-by: Daan De Meyer <daan@amutable.com>

When LOOP_CONFIGURE is called with LO_FLAGS_PARTSCAN, the following sequence occurs: 1. disk_force_media_change() sets GD_NEED_PART_SCAN 2. Uevent suppression is lifted and a KOBJ_CHANGE uevent is sent 3. loop_global_unlock() releases the lock 4. loop_reread_partitions() calls bdev_disk_changed() to scan There is a race between steps 2 and 4: when udev receives the uevent and opens the device before loop_reread_partitions() runs, blkdev_get_whole() in bdev.c sees GD_NEED_PART_SCAN set and calls bdev_disk_changed() for a first scan. Then loop_reread_partitions() does a second scan. The open_mutex serializes these two scans, but does not prevent both from running. The second scan in bdev_disk_changed() drops all partition devices from the first scan (via blk_drop_partitions()) before re-adding them, causing partition block devices to briefly disappear. This breaks any systemd unit with BindsTo= on the partition device: systemd observes the device going dead, fails the dependent units, and does not retry them when the device reappears. Fix this by removing the GD_NEED_PART_SCAN set from disk_force_media_change() entirely. None of the current callers need the lazy on-open partition scan triggered by this flag: - floppy: sets GENHD_FL_NO_PART, so disk_has_partscan() is always false and GD_NEED_PART_SCAN has no effect. - loop (loop_configure, loop_change_fd): when LO_FLAGS_PARTSCAN is set, loop_reread_partitions() performs an explicit scan. When not set, GD_SUPPRESS_PART_SCAN prevents the lazy scan path. - loop (__loop_clr_fd): calls bdev_disk_changed() explicitly if LO_FLAGS_PARTSCAN is set. - nbd (nbd_clear_sock_ioctl): capacity is set to zero immediately after; nbd manages GD_NEED_PART_SCAN explicitly elsewhere. With GD_NEED_PART_SCAN no longer set by disk_force_media_change(), udev opening the loop device after the uevent no longer triggers a redundant scan in blkdev_get_whole(), and only the single explicit scan from loop_reread_partitions() runs. A regression test for this bug has been submitted to blktests: linux-blktests/blktests#240. Fixes: 9f65c48 ("loop: raise media_change event") Signed-off-by: Daan De Meyer <daan@amutable.com>

When LOOP_CONFIGURE is called with LO_FLAGS_PARTSCAN, the following sequence occurs: 1. disk_force_media_change() sets GD_NEED_PART_SCAN 2. Uevent suppression is lifted and a KOBJ_CHANGE uevent is sent 3. loop_global_unlock() releases the lock 4. loop_reread_partitions() calls bdev_disk_changed() to scan There is a race between steps 2 and 4: when udev receives the uevent and opens the device before loop_reread_partitions() runs, blkdev_get_whole() in bdev.c sees GD_NEED_PART_SCAN set and calls bdev_disk_changed() for a first scan. Then loop_reread_partitions() does a second scan. The open_mutex serializes these two scans, but does not prevent both from running. The second scan in bdev_disk_changed() drops all partition devices from the first scan (via blk_drop_partitions()) before re-adding them, causing partition block devices to briefly disappear. This breaks any systemd unit with BindsTo= on the partition device: systemd observes the device going dead, fails the dependent units, and does not retry them when the device reappears. Fix this by removing the GD_NEED_PART_SCAN set from disk_force_media_change() entirely. None of the current callers need the lazy on-open partition scan triggered by this flag: - floppy: sets GENHD_FL_NO_PART, so disk_has_partscan() is always false and GD_NEED_PART_SCAN has no effect. - loop (loop_configure, loop_change_fd): when LO_FLAGS_PARTSCAN is set, loop_reread_partitions() performs an explicit scan. When not set, GD_SUPPRESS_PART_SCAN prevents the lazy scan path. - loop (__loop_clr_fd): calls bdev_disk_changed() explicitly if LO_FLAGS_PARTSCAN is set. - nbd (nbd_clear_sock_ioctl): capacity is set to zero immediately after; nbd manages GD_NEED_PART_SCAN explicitly elsewhere. With GD_NEED_PART_SCAN no longer set by disk_force_media_change(), udev opening the loop device after the uevent no longer triggers a redundant scan in blkdev_get_whole(), and only the single explicit scan from loop_reread_partitions() runs. A regression test for this bug has been submitted to blktests: linux-blktests/blktests#240. Fixes: 9f65c48 ("loop: raise media_change event") Signed-off-by: Daan De Meyer <daan@amutable.com> Acked-by: Christian Brauner <brauner@kernel.org>

When LOOP_CONFIGURE is called with LO_FLAGS_PARTSCAN, the following sequence occurs: 1. disk_force_media_change() sets GD_NEED_PART_SCAN 2. Uevent suppression is lifted and a KOBJ_CHANGE uevent is sent 3. loop_global_unlock() releases the lock 4. loop_reread_partitions() calls bdev_disk_changed() to scan There is a race between steps 2 and 4: when udev receives the uevent and opens the device before loop_reread_partitions() runs, blkdev_get_whole() in bdev.c sees GD_NEED_PART_SCAN set and calls bdev_disk_changed() for a first scan. Then loop_reread_partitions() does a second scan. The open_mutex serializes these two scans, but does not prevent both from running. The second scan in bdev_disk_changed() drops all partition devices from the first scan (via blk_drop_partitions()) before re-adding them, causing partition block devices to briefly disappear. This breaks any systemd unit with BindsTo= on the partition device: systemd observes the device going dead, fails the dependent units, and does not retry them when the device reappears. Fix this by removing the GD_NEED_PART_SCAN set from disk_force_media_change() entirely. None of the current callers need the lazy on-open partition scan triggered by this flag: - floppy: sets GENHD_FL_NO_PART, so disk_has_partscan() is always false and GD_NEED_PART_SCAN has no effect. - loop (loop_configure, loop_change_fd): when LO_FLAGS_PARTSCAN is set, loop_reread_partitions() performs an explicit scan. When not set, GD_SUPPRESS_PART_SCAN prevents the lazy scan path. - loop (__loop_clr_fd): calls bdev_disk_changed() explicitly if LO_FLAGS_PARTSCAN is set. - nbd (nbd_clear_sock_ioctl): capacity is set to zero immediately after; nbd manages GD_NEED_PART_SCAN explicitly elsewhere. With GD_NEED_PART_SCAN no longer set by disk_force_media_change(), udev opening the loop device after the uevent no longer triggers a redundant scan in blkdev_get_whole(), and only the single explicit scan from loop_reread_partitions() runs. A regression test for this bug has been submitted to blktests: linux-blktests/blktests#240. Fixes: 9f65c48 ("loop: raise media_change event") Signed-off-by: Daan De Meyer <daan@amutable.com> Acked-by: Christian Brauner <brauner@kernel.org> Link: https://patch.msgid.link/20260331105130.1077599-1-daan@amutable.com Signed-off-by: Jens Axboe <axboe@kernel.dk>

When LOOP_CONFIGURE is called with LO_FLAGS_PARTSCAN, the following sequence occurs: 1. disk_force_media_change() sets GD_NEED_PART_SCAN 2. Uevent suppression is lifted and a KOBJ_CHANGE uevent is sent 3. loop_global_unlock() releases the lock 4. loop_reread_partitions() calls bdev_disk_changed() to scan There is a race between steps 2 and 4: when udev receives the uevent and opens the device before loop_reread_partitions() runs, blkdev_get_whole() in bdev.c sees GD_NEED_PART_SCAN set and calls bdev_disk_changed() for a first scan. Then loop_reread_partitions() does a second scan. The open_mutex serializes these two scans, but does not prevent both from running. The second scan in bdev_disk_changed() drops all partition devices from the first scan (via blk_drop_partitions()) before re-adding them, causing partition block devices to briefly disappear. This breaks any systemd unit with BindsTo= on the partition device: systemd observes the device going dead, fails the dependent units, and does not retry them when the device reappears. Fix this by removing the GD_NEED_PART_SCAN set from disk_force_media_change() entirely. None of the current callers need the lazy on-open partition scan triggered by this flag: - floppy: sets GENHD_FL_NO_PART, so disk_has_partscan() is always false and GD_NEED_PART_SCAN has no effect. - loop (loop_configure, loop_change_fd): when LO_FLAGS_PARTSCAN is set, loop_reread_partitions() performs an explicit scan. When not set, GD_SUPPRESS_PART_SCAN prevents the lazy scan path. - loop (__loop_clr_fd): calls bdev_disk_changed() explicitly if LO_FLAGS_PARTSCAN is set. - nbd (nbd_clear_sock_ioctl): capacity is set to zero immediately after; nbd manages GD_NEED_PART_SCAN explicitly elsewhere. With GD_NEED_PART_SCAN no longer set by disk_force_media_change(), udev opening the loop device after the uevent no longer triggers a redundant scan in blkdev_get_whole(), and only the single explicit scan from loop_reread_partitions() runs. A regression test for this bug has been submitted to blktests: linux-blktests/blktests#240. Fixes: 9f65c48 ("loop: raise media_change event") Signed-off-by: Daan De Meyer <daan@amutable.com> Acked-by: Christian Brauner <brauner@kernel.org>

kawasaki · 2026-04-08T04:47:44Z

Hello @daandemeyer

Today, I ran the added test case on my test environment, and found the test case fail even when I apply the fix patch to the kernel.

$ sudo ./check loop/012
loop/012 (check for spurious partition removal when partscan is enabled) [failed]
    runtime  5.682s  ...  6.243s
    --- tests/loop/012.out      2026-04-08 10:11:48.266000000 +0900
    +++ /home/shin/Blktests/blktests/results/nodev/loop/012.out.bad     2026-04-08 13:45:01.585000000 +0900
    @@ -1,2 +1,4 @@
     Running loop/012
    +Fail: 111 iterations but 149 add events (expected 111)
    +Fail: 111 iterations but 149 remove events (expected 111)
     Test complete
[shin@testnode2 blktests ]$

I observed the failure with both v2 and v3 kernel patches. I used v7.0-rc7 as the baseline kernel. Is this failure expected?

kawasaki · 2026-04-08T04:57:35Z

One more nit comment. Some commands in the new test case use short options: "truncate -s", "losetup -f" or so. For readability and stability, longer options are recommended, like "truncate --size=", "losetup --find". I'm fine with the current short options, but if you have chance to respin, I suggest to use longer options. I attach a diff file to show how the new test case will look like with the longer options.

review.txt

When LOOP_CONFIGURE is called with LO_FLAGS_PARTSCAN, the following sequence occurs: 1. disk_force_media_change() sets GD_NEED_PART_SCAN 2. Uevent suppression is lifted and a KOBJ_CHANGE uevent is sent 3. loop_global_unlock() releases the lock 4. loop_reread_partitions() calls bdev_disk_changed() to scan There is a race between steps 2 and 4: when udev receives the uevent and opens the device before loop_reread_partitions() runs, blkdev_get_whole() in bdev.c sees GD_NEED_PART_SCAN set and calls bdev_disk_changed() for a first scan. Then loop_reread_partitions() does a second scan. The open_mutex serializes these two scans, but does not prevent both from running. The second scan in bdev_disk_changed() drops all partition devices from the first scan (via blk_drop_partitions()) before re-adding them, causing partition block devices to briefly disappear. This breaks any systemd unit with BindsTo= on the partition device: systemd observes the device going dead, fails the dependent units, and does not retry them when the device reappears. Fix this by removing the GD_NEED_PART_SCAN set from disk_force_media_change() entirely. None of the current callers need the lazy on-open partition scan triggered by this flag: - floppy: sets GENHD_FL_NO_PART, so disk_has_partscan() is always false and GD_NEED_PART_SCAN has no effect. - loop (loop_configure, loop_change_fd): when LO_FLAGS_PARTSCAN is set, loop_reread_partitions() performs an explicit scan. When not set, GD_SUPPRESS_PART_SCAN prevents the lazy scan path. - loop (__loop_clr_fd): calls bdev_disk_changed() explicitly if LO_FLAGS_PARTSCAN is set. - nbd (nbd_clear_sock_ioctl): capacity is set to zero immediately after; nbd manages GD_NEED_PART_SCAN explicitly elsewhere. With GD_NEED_PART_SCAN no longer set by disk_force_media_change(), udev opening the loop device after the uevent no longer triggers a redundant scan in blkdev_get_whole(), and only the single explicit scan from loop_reread_partitions() runs. A regression test for this bug has been submitted to blktests: linux-blktests/blktests#240. Fixes: 9f65c48 ("loop: raise media_change event") Signed-off-by: Daan De Meyer <daan@amutable.com> Acked-by: Christian Brauner <brauner@kernel.org>

daandemeyer · 2026-04-13T07:17:05Z

@kawasaki I ran it a bunch of times and it doesn't fail on my test setup :/

kawasaki · 2026-04-13T08:20:57Z

@daandemeyer Thank you for trying out. FYI, here I attach the kernel config of the kernel that I used.
_config_v7.0-rc7_pr240_failure.gz

And I used Fedora 43 on QEMU VM as the userland of my test environment. Which distro do you use? Do you run your test on any VM or bare metal machine?

daandemeyer · 2026-04-13T08:23:19Z

My test environment is https://github.com/DaanDeMeyer/mkosi-kernel, an Arch Linux image running in qemu (booted with systemd).

(I verified the test failed without my changes as well)

kawasaki · 2026-04-14T10:00:09Z

Thanks. I started trying mkosi and mkosi-kernel, but still failing to building the Arch image. Will try to allocate more time for it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loop: add regression test for partscan double-scan race#240

loop: add regression test for partscan double-scan race#240
daandemeyer wants to merge 1 commit intolinux-blktests:masterfrom
daandemeyer:loop-race

daandemeyer commented Mar 31, 2026

Uh oh!

kawasaki commented Apr 8, 2026

Uh oh!

kawasaki commented Apr 8, 2026

Uh oh!

daandemeyer commented Apr 13, 2026

Uh oh!

kawasaki commented Apr 13, 2026

Uh oh!

daandemeyer commented Apr 13, 2026 •

edited

Loading

Uh oh!

kawasaki commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

daandemeyer commented Mar 31, 2026

Uh oh!

kawasaki commented Apr 8, 2026

Uh oh!

kawasaki commented Apr 8, 2026

Uh oh!

daandemeyer commented Apr 13, 2026

Uh oh!

kawasaki commented Apr 13, 2026

Uh oh!

daandemeyer commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kawasaki commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

daandemeyer commented Apr 13, 2026 •

edited

Loading