ublk: fix infinite loop in ublk server teardown#725
ublk: fix infinite loop in ublk server teardown#725blktests-ci[bot] wants to merge 2 commits intolinus-master_basefrom
Conversation
|
Upstream branch: d8a9a4b |
910d344 to
ed862bc
Compare
|
Upstream branch: 7ca6d1c |
c5c5a96 to
a04ef4e
Compare
|
Upstream branch: 7ca6d1c |
a04ef4e to
31c1202
Compare
ed862bc to
2d0c3d5
Compare
|
Upstream branch: 3aae938 |
31c1202 to
6b4851f
Compare
|
Upstream branch: 3aae938 |
6b4851f to
ed88789
Compare
|
Upstream branch: 3aae938 |
ed88789 to
01584cc
Compare
|
Upstream branch: 3aae938 |
01584cc to
6fc4b5b
Compare
2d0c3d5 to
931d9b0
Compare
|
Upstream branch: 3036cd0 |
6fc4b5b to
ea3e788
Compare
931d9b0 to
78a4682
Compare
If a ublk server starts recovering devices but dies before issuing fetch commands for all IOs, cancellation of the fetch commands that were successfully issued may never complete. This is because the per-IO canceled flag can remain set even after the fetch for that IO has been submitted - the per-IO canceled flags for all IOs in a queue are reset together only once all IOs for that queue have been fetched. So if a nonempty proper subset of the IOs for a queue are fetched when the ublk server dies, the IOs in that subset will never successfully be canceled, as their canceled flags remain set, and this prevents ublk_cancel_cmd from actually calling io_uring_cmd_done on the commands, despite the fact that they are outstanding. Fix this by resetting the per-IO cancel flags immediately when each IO is fetched instead of waiting for all IOs for the queue (which may never happen). Signed-off-by: Uday Shankar <ushankar@purestorage.com> Fixes: 728cbac ("ublk: move device reset into ublk_ch_release()") Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: zhang, the-essence-of-life <zhangweize9@gmail.com>
Before the fix, teardown of a ublk server that was attempting to recover a device, but died when it had submitted a nonempty proper subset of the fetch commands to any queue would loop forever. Add a test to verify that, after the fix, teardown completes. This is done by: - Adding a new argument to the fault_inject target that causes it die after fetching a nonempty proper subset of the IOs to a queue - Using that argument in a new test while trying to recover an already-created device - Attempting to delete the ublk device at the end of the test; this hangs forever if teardown from the fault-injected ublk server never completed. It was manually verified that the test passes with the fix and hangs without it. Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com>
|
Upstream branch: 9a9c8ce |
ea3e788 to
bcafd05
Compare
Pull request for series with
subject: ublk: fix infinite loop in ublk server teardown
version: 1
url: https://patchwork.kernel.org/project/linux-block/list/?series=1077214