Skip to content

block: fix resource leak in blk_register_queue() error path#716

Open
blktests-ci[bot] wants to merge 59 commits intofor-next_basefrom
series/1076130=>for-next
Open

block: fix resource leak in blk_register_queue() error path#716
blktests-ci[bot] wants to merge 59 commits intofor-next_basefrom
series/1076130=>for-next

Conversation

@blktests-ci
Copy link
Copy Markdown

@blktests-ci blktests-ci bot commented Apr 2, 2026

Pull request for series with
subject: block: fix resource leak in blk_register_queue() error path
version: 1
url: https://patchwork.kernel.org/project/linux-block/list/?series=1076130

axboe and others added 30 commits March 17, 2026 13:54
* io_uring-7.0:
  io_uring/poll: fix multishot recv missing EOF on wakeup race
* for-7.1/block: (32 commits)
  ublk: report BLK_SPLIT_INTERVAL_CAPABLE
  blk-integrity: support arbitrary buffer alignment
  blk-cgroup: wait for blkcg cleanup before initializing new disk
  block: clear BIO_QOS flags in blk_steal_bios()
  block: move bio queue-transition flag fixups into blk_steal_bios()
  block: remove bdev_nonrot()
  block: Correct comments on bio_alloc_clone() and bio_init_clone()
  Documentation: ABI: stable: document the zoned_qd1_writes attribute
  block: default to QD=1 writes for blk-mq rotational zoned devices
  block: allow submitting all zone writes from a single context
  block: rename struct gendisk zone_wplugs_lock field
  block: remove disk_zone_is_full()
  block: rename and simplify disk_get_and_lock_zone_wplug()
  block: fix zone write plugs refcount handling in disk_zone_wplug_schedule_bio_work()
  block: fix zone write plug removal
  block: annotate struct request_queue with __counted_by_ptr
  sed-opal: add IOC_OPAL_GET_SUM_STATUS ioctl.
  sed-opal: increase column attribute type size to 64 bits.
  sed-opal: add IOC_OPAL_ENABLE_DISABLE_LR.
  sed-opal: add IOC_OPAL_LR_SET_START_LEN ioctl.
  ...
* for-7.1/io_uring: (23 commits)
  io_uring/bpf-ops: implement bpf ops registration
  io_uring/bpf-ops: add kfunc helpers
  io_uring/bpf-ops: implement loop_step with BPF struct_ops
  io_uring: introduce callback driven main loop
  nvme: remove nvme_dev_uring_cmd() IO_URING_F_IOPOLL check
  io_uring/uring_cmd: allow non-iopoll cmds with IORING_SETUP_IOPOLL
  io_uring: count CQEs in io_iopoll_check()
  io_uring: remove iopoll_queue from struct io_issue_def
  io_uring: add REQ_F_IOPOLL
  io_uring: mark known and harmless racy ctx->int_flags uses
  io_uring: switch struct io_ring_ctx internal bitfields to flags
  io_uring/zctx: separate notification user_data
  io_uring/net: allow vectorised regbuf send zc
  io_uring/timeout: immediate timeout arg
  io_uring/timeout: migrate reqs from ts64 to ktime
  io_uring/timeout: add helper for parsing user time
  io_uring/timeout: check unused sqe fields
  io_uring/zcrx: move zcrx uapi into separate header
  io_uring/zcrx: declare some constants for query
  io_uring/zctx: unify zerocopy issue variants
  ...
* for-7.1/io_uring:
  io_uring: avoid req->ctx reload in io_req_put_rsrc_nodes()
  io_uring/rw: use cached file rather than req->file
  io_uring/net: use 'ctx' consistently
  io_uring/poll: cache req->apoll_events
  io_uring/kbuf: use 'ctx' consistently
* for-7.1/block:
  block: remove bvec_free
  block: split bio_alloc_bioset more clearly into a fast and slowpath
  block: mark bvec_{alloc,free} static
* for-7.1/block:
  blk-mq: make blk_mq_hw_ctx_sysfs_entry instances const
  blk-crypto: make blk_crypto_attr instances const
  block: ia-ranges: make blk_ia_range_sysfs_entry instances const
  block: make queue_sysfs_entry instances const
* for-7.1/block:
  block: reject zero length in bio_add_page()
* for-7.1/block:
  scsi: bsg: add io_uring passthrough handler
  bsg: add io_uring command support to generic layer
  bsg: add bsg_uring_cmd uapi structure
* io_uring-7.0:
  io_uring/kbuf: propagate BUF_MORE through early buffer commit path
  io_uring/kbuf: fix missing BUF_MORE for incremental buffers at EOF
Add support for kernel-managed buffer rings (kmbuf rings), which allow
the kernel to allocate and manage the backing buffers for a buffer
ring, rather than requiring the application to provide and manage them.

Internally, the IOBL_KERNEL_MANAGED flag marks buffer lists as
kernel-managed for appropriate handling in the I/O path.

At the uapi level, kernel-managed buffer rings are created through the
pbuf interface with the IOU_PBUF_RING_KERNEL_MANAGED flag set. The
io_uring_buf_reg struct is modified to allow taking in a buf_size
instead of a ring_addr. To create a kernel-managed buffer ring, the
caller must set the IOU_PBUF_RING_MMAP flag as well to indicate that the
kernel will allocate the memory for the ring. When the caller mmaps
the ring, they will get back a virtual mapping to the buffer memory.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://patch.msgid.link/20260306003224.3620942-2-joannelkoong@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Allow kernel-managed buffers to be selected. This requires modifying the
io_br_sel struct to separate the fields for address and val, since a
kernel address cannot be distinguished from a negative val when error
checking.

Auto-commit any selected kernel-managed buffer.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://patch.msgid.link/20260306003224.3620942-3-joannelkoong@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add kernel APIs to pin and unpin buffer rings, preventing userspace from
unregistering a buffer ring while it is pinned by the kernel.

This provides a mechanism for kernel subsystems to safely access buffer
ring contents while ensuring the buffer ring remains valid. A pinned
buffer ring cannot be unregistered until explicitly unpinned. On the
userspace side, trying to unregister a pinned buffer will return -EBUSY.

This is a preparatory change for upcoming fuse usage of kernel-managed
buffer rings. It is necessary for fuse to pin the buffer ring because
fuse may need to select a buffer in atomic contexts, which it can only
do so by using the underlying buffer list pointer.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://patch.msgid.link/20260306003224.3620942-4-joannelkoong@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Return the id of the selected buffer in io_buffer_select(). This is
needed for kernel-managed buffer rings to later recycle the selected
buffer.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://patch.msgid.link/20260306003224.3620942-5-joannelkoong@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add an interface for buffers to be recycled back into a kernel-managed
buffer ring.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://patch.msgid.link/20260306003224.3620942-6-joannelkoong@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_uring_is_kmbuf_ring() returns true if there is a kernel-managed
buffer ring at the specified buffer group.

This is a preparatory patch for upcoming fuse kernel-managed buffer
support, which needs to ensure the buffer ring registered by the server
is a kernel-managed buffer ring.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://patch.msgid.link/20260306003224.3620942-7-joannelkoong@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Export io_ring_buffer_select() so that it may be used by callers who
pass in a pinned bufring without needing to grab the io_uring mutex.

This is a preparatory patch that will be needed by fuse io-uring, which
will need to select a buffer from a kernel-managed bufring while the
uring mutex may already be held by in-progress commits, and may need to
select a buffer in atomic contexts.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://patch.msgid.link/20260306003224.3620942-8-joannelkoong@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When uring_cmd operations select a buffer, the completion queue entry
should indicate which buffer was selected.

Set IORING_CQE_F_BUFFER on the completed entry and encode the buffer
index if a buffer was selected.

This change is needed in order to relay to userspace which selected
buffer contains the data.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://patch.msgid.link/20260306003224.3620942-9-joannelkoong@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
* for-7.1/io_uring:
  io_uring/cmd: set selected buffer index in __io_uring_cmd_done()
  io_uring/kbuf: export io_ring_buffer_select()
  io_uring/kbuf: add io_uring_is_kmbuf_ring()
  io_uring/kbuf: add recycling for kernel managed buffer rings
  io_uring/kbuf: return buffer id in buffer selection
  io_uring/kbuf: add buffer ring pinning/unpinning
  io_uring/kbuf: support kernel-managed buffer rings in buffer selection
  io_uring/kbuf: add support for kernel-managed buffer rings

Signed-off-by: Jens Axboe <axboe@kernel.dk>
* for-7.1/block:
  block: partitions: Replace pp_buf with struct seq_buf
In our production environment, we have received multiple crash reports
regarding libceph, which have caught our attention:

```
[6888366.280350] Call Trace:
[6888366.280452]  blk_update_request+0x14e/0x370
[6888366.280561]  blk_mq_end_request+0x1a/0x130
[6888366.280671]  rbd_img_handle_request+0x1a0/0x1b0 [rbd]
[6888366.280792]  rbd_obj_handle_request+0x32/0x40 [rbd]
[6888366.280903]  __complete_request+0x22/0x70 [libceph]
[6888366.281032]  osd_dispatch+0x15e/0xb40 [libceph]
[6888366.281164]  ? inet_recvmsg+0x5b/0xd0
[6888366.281272]  ? ceph_tcp_recvmsg+0x6f/0xa0 [libceph]
[6888366.281405]  ceph_con_process_message+0x79/0x140 [libceph]
[6888366.281534]  ceph_con_v1_try_read+0x5d7/0xf30 [libceph]
[6888366.281661]  ceph_con_workfn+0x329/0x680 [libceph]
```

After analyzing the coredump file, we found that the address of
dc->sb_bio has been freed. We know that cached_dev is only freed when it
is stopped.

Since sb_bio is a part of struct cached_dev, rather than an alloc every
time.  If the device is stopped while writing to the superblock, the
released address will be accessed at endio.

This patch hopes to wait for sb_write to complete in cached_dev_free.

It should be noted that we analyzed the cause of the problem, then tell
all details to the QWEN and adopted the modifications it made.

Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
Fixes: cafe563 ("bcache: A block layer cache")
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Coly Li <colyli@fnnas.com>
Link: https://patch.msgid.link/20260322134102.480107-1-colyli@fnnas.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block-7.0:
  bcache: fix cached_dev.sb_bio use-after-free and crash
* for-7.1/block:
  md: remove unused mddev argument from export_rdev
  md/raid5: move handle_stripe() comment to correct location
  md/raid5: remove stale md_raid5_kick_device() declaration
  md/raid1: fix the comparing region of interval tree
  md/raid5: skip 2-failure compute when other disk is R5_LOCKED
  md/md-llbitmap: raise barrier before state machine transition
  md/md-llbitmap: skip reading rdevs that are not in_sync
  md/raid5: set chunk_sectors to enable full stripe I/O splitting
  md/raid10: fix deadlock with check operation and nowait requests
  md: suppress spurious superblock update error message for dm-raid
* for-7.1/block:
  ublk: move cold paths out of __ublk_batch_dispatch() for icache efficiency
* for-7.1/block:
  block: fix bio_alloc_bioset slowpath GFP handling
There are reports where io_uring instance removal takes too long and an
ifq reallocation by another zcrx instance fails. Split zcrx destruction
into two steps similarly how it was before, first close the queue early
but maintain zcrx alive, and then when all inflight requests are
completed, drop the main zcrx reference. For extra protection, mark
terminated zcrx instances in xarray and warn if we double put them.

Cc: stable@vger.kernel.org # 6.19+
Link: axboe/liburing#1550
Reported-by: Youngmin Choi <youngminchoi94@gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/0ce21f0565ab4358668922a28a8a36922dfebf76.1774261953.git.asml.silence@gmail.com
[axboe: NULL ifq before break inside scoped guard]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When accounting fails, io_import_umem() sets the page array, etc. and
returns an error expecting that the error handling code will take care
of the rest. To make the next patch simpler, only return a fully
initialised areas from the function.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/3a602b7fb347dbd4da6797ac49b52ea5dedb856d.1774261953.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
zcrx was originally establisihing dma mappings at a late stage when it
was being bound to a page pool. Dma-buf couldn't work this way, so it's
initialised during area creation.

It's messy having them do it at different spots, just move everything to
the area creation time.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/334092a2cbdd4aabd7c025050aa99f05ace89bb5.1774261953.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In preparation to following patches, add a function that is responsibly
for looking up a netdev, creating an area, DMA mapping it and opening a
queue.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/88cb6f746ecb496a9030756125419df273d0b003.1774261953.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Allow creating a zcrx instance without attaching it to a net device.
All data will be copied through the fallback path. The user is also
expected to use ZCRX_CTRL_FLUSH_RQ to handle overflows as it normally
should even with a netdev, but it becomes even more relevant as there
will likely be no one to automatically pick up buffers.

Apart from that, it follows the zcrx uapi for the I/O path, and is
useful for testing, experimentation, and potentially for the copy
receive path in the future if improved.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/674f8ad679c5a0bc79d538352b3042cf0999596e.1774261953.git.asml.silence@gmail.com
[axboe: fix spelling error in uapi header and commit message]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Rename "region" to "rq_region" to highlight that it's a refill queue
region.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/ac815790d2477a15826aecaa3d94f2a94ef507e6.1774261953.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
axboe and others added 17 commits March 25, 2026 07:11
* for-7.1/block:
  drbd: use genl pre_doit/post_doit
* io_uring-7.0:
  io_uring/fdinfo: fix SQE_MIXED SQE displaying
* for-7.1/block:
  drbd: Balance RCU calls in drbd_adm_dump_devices()
* io_uring-7.0:
  io_uring/fdinfo: fix OOB read in SQE_MIXED wrap check
* for-7.1/block: (41 commits)
  nvme-loop: do not cancel I/O and admin tagset during ctrl reset/shutdown
  nvme: add WQ_PERCPU to alloc_workqueue users
  nvmet-fc: add WQ_PERCPU to alloc_workqueue users
  nvmet: replace use of system_wq with system_percpu_wq
  nvme-auth: Don't propose NVME_AUTH_DHGROUP_NULL with SC_C
  nvme: Add the DHCHAP maximum HD IDs
  nvme-pci: add NVME_QUIRK_DISABLE_WRITE_ZEROES for Kingston OM3SGP4
  nvme: respect NVME_QUIRK_DISABLE_WRITE_ZEROES when wzsl is set
  nvmet: report NPDGL and NPDAL
  nvmet: use NVME_NS_FEAT_OPTPERF_SHIFT
  nvme: set discard_granularity from NPDG/NPDA
  nvme: add from0based() helper
  nvme: always issue I/O Command Set specific Identify Namespace
  nvme: update nvme_id_ns OPTPERF constants
  nvme: fold nvme_config_discard() into nvme_update_disk_info()
  nvme: add preferred I/O size fields to struct nvme_id_ns_nvm
  nvme: Allow reauth from sysfs
  nvme: Expose the tls_configured sysfs for secure concat connections
  nvmet-tcp: Don't free SQ on authentication success
  nvmet-tcp: Don't error if TLS is enabed on a reset
  ...
* io_uring-7.0:
  io_uring/rsrc: reject zero-length fixed buffer import
  io_uring/net: fix slab-out-of-bounds read in io_bundle_nbufs()
* for-7.1/block:
  block: fix zones_cond memory leak on zone revalidation error paths
  loop: fix partition scan race between udev and loop_reread_partitions()
  sed-opal: Add STACK_RESET command
Commit 9618908 addressed one case of ctx->rings being potentially
accessed while a resize is happening on the ring, but there are still
a few others that need handling. Add a helper for retrieving the
rings associated with an io_uring context, and add some sanity checking
to that to catch bad uses. ->rings_rcu is always valid, as long as it's
used within RCU read lock. Any use of ->rings_rcu or ->rings inside
either ->uring_lock or ->completion_lock is sane as well.

Do the minimum fix for the current kernel, but set it up such that this
basic infra can be extended for later kernels to make this harder to
mess up in the future.

Thanks to Junxi Qian for finding and debugging this issue.

Cc: stable@vger.kernel.org
Fixes: 79cfe9e ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS")
Reviewed-by: Junxi Qian <qjx1298677004@gmail.com>
Tested-by: Junxi Qian <qjx1298677004@gmail.com>
Link: https://lore.kernel.org/io-uring/20260330172348.89416-1-qjx1298677004@gmail.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring-7.0:
  io_uring: protect remaining lockless ctx->rings accesses with RCU
Replace kfree(node) with io_cache_free() in io_buffer_register_bvec()
to match all other error paths that free nodes allocated via
io_rsrc_node_alloc(). The node is allocated through io_cache_alloc()
internally, so it should be returned to the cache via io_cache_free()
for proper object reuse.

Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Link: https://patch.msgid.link/20260331104509.7055-1-liu.yun@linux.dev
[axboe: remove fixes tag, it's not a fix, it's a cleanup]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
* for-7.1/io_uring:
  io_uring/rsrc: use io_cache_free() to free node
* for-7.1/block:
  zloop: add max_open_zones option
If io_parse_restrictions() fails, it ends up clearing any restrictions
currently set. The intent is only to clear whatever it already applied,
but it ends up clearing everything, including whatever settings may have
been applied in a copy-on-write fashion already. Ensure that those are
retained.

Link: https://lore.kernel.org/io-uring/CAK8a0jzF-zaO5ZmdOrmfuxrhXuKg5m5+RDuO7tNvtj=kUYbW7Q@mail.gmail.com/
Reported-by: antonius <bluedragonsec2023@gmail.com>
Fixes: ed82f35 ("io_uring: allow registration of per-task restrictions")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring-7.0:
  io_uring/bpf_filters: retain COW'ed settings on parse failures
* for-7.1/block:
  blk-cgroup: fix disk reference leak in blkcg_maybe_throttle_current()
* for-7.1/block:
  blk-iocost: fix busy_level reset when no IOs complete
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci bot commented Apr 2, 2026

Upstream branch: 132ba7a
series: https://patchwork.kernel.org/project/linux-block/list/?series=1076130
version: 1

Pull request is NOT updated. Failed to apply https://patchwork.kernel.org/project/linux-block/list/?series=1076130
error message:

Cmd('git') failed due to: exit code(128)
  cmdline: git am --3way
  stdout: 'Applying: block: fix resource leak in blk_register_queue() error path
Patch failed at 0001 block: fix resource leak in blk_register_queue() error path'
  stderr: 'error: sha1 information is lacking or useless (block/blk-sysfs.c).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"'

conflict:


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants