block: fix resource leak in blk_register_queue() error path#716
Open
blktests-ci[bot] wants to merge 59 commits intofor-next_basefrom
Open
block: fix resource leak in blk_register_queue() error path#716blktests-ci[bot] wants to merge 59 commits intofor-next_basefrom
blktests-ci[bot] wants to merge 59 commits intofor-next_basefrom
Conversation
* io_uring-7.0: io_uring/poll: fix multishot recv missing EOF on wakeup race
* for-7.1/block: (32 commits) ublk: report BLK_SPLIT_INTERVAL_CAPABLE blk-integrity: support arbitrary buffer alignment blk-cgroup: wait for blkcg cleanup before initializing new disk block: clear BIO_QOS flags in blk_steal_bios() block: move bio queue-transition flag fixups into blk_steal_bios() block: remove bdev_nonrot() block: Correct comments on bio_alloc_clone() and bio_init_clone() Documentation: ABI: stable: document the zoned_qd1_writes attribute block: default to QD=1 writes for blk-mq rotational zoned devices block: allow submitting all zone writes from a single context block: rename struct gendisk zone_wplugs_lock field block: remove disk_zone_is_full() block: rename and simplify disk_get_and_lock_zone_wplug() block: fix zone write plugs refcount handling in disk_zone_wplug_schedule_bio_work() block: fix zone write plug removal block: annotate struct request_queue with __counted_by_ptr sed-opal: add IOC_OPAL_GET_SUM_STATUS ioctl. sed-opal: increase column attribute type size to 64 bits. sed-opal: add IOC_OPAL_ENABLE_DISABLE_LR. sed-opal: add IOC_OPAL_LR_SET_START_LEN ioctl. ...
* for-7.1/io_uring: (23 commits) io_uring/bpf-ops: implement bpf ops registration io_uring/bpf-ops: add kfunc helpers io_uring/bpf-ops: implement loop_step with BPF struct_ops io_uring: introduce callback driven main loop nvme: remove nvme_dev_uring_cmd() IO_URING_F_IOPOLL check io_uring/uring_cmd: allow non-iopoll cmds with IORING_SETUP_IOPOLL io_uring: count CQEs in io_iopoll_check() io_uring: remove iopoll_queue from struct io_issue_def io_uring: add REQ_F_IOPOLL io_uring: mark known and harmless racy ctx->int_flags uses io_uring: switch struct io_ring_ctx internal bitfields to flags io_uring/zctx: separate notification user_data io_uring/net: allow vectorised regbuf send zc io_uring/timeout: immediate timeout arg io_uring/timeout: migrate reqs from ts64 to ktime io_uring/timeout: add helper for parsing user time io_uring/timeout: check unused sqe fields io_uring/zcrx: move zcrx uapi into separate header io_uring/zcrx: declare some constants for query io_uring/zctx: unify zerocopy issue variants ...
* for-7.1/io_uring: io_uring: avoid req->ctx reload in io_req_put_rsrc_nodes() io_uring/rw: use cached file rather than req->file io_uring/net: use 'ctx' consistently io_uring/poll: cache req->apoll_events io_uring/kbuf: use 'ctx' consistently
* for-7.1/block:
block: remove bvec_free
block: split bio_alloc_bioset more clearly into a fast and slowpath
block: mark bvec_{alloc,free} static
* for-7.1/block: blk-mq: make blk_mq_hw_ctx_sysfs_entry instances const blk-crypto: make blk_crypto_attr instances const block: ia-ranges: make blk_ia_range_sysfs_entry instances const block: make queue_sysfs_entry instances const
* for-7.1/block: block: reject zero length in bio_add_page()
* for-7.1/block: scsi: bsg: add io_uring passthrough handler bsg: add io_uring command support to generic layer bsg: add bsg_uring_cmd uapi structure
* io_uring-7.0: io_uring/kbuf: propagate BUF_MORE through early buffer commit path io_uring/kbuf: fix missing BUF_MORE for incremental buffers at EOF
Add support for kernel-managed buffer rings (kmbuf rings), which allow the kernel to allocate and manage the backing buffers for a buffer ring, rather than requiring the application to provide and manage them. Internally, the IOBL_KERNEL_MANAGED flag marks buffer lists as kernel-managed for appropriate handling in the I/O path. At the uapi level, kernel-managed buffer rings are created through the pbuf interface with the IOU_PBUF_RING_KERNEL_MANAGED flag set. The io_uring_buf_reg struct is modified to allow taking in a buf_size instead of a ring_addr. To create a kernel-managed buffer ring, the caller must set the IOU_PBUF_RING_MMAP flag as well to indicate that the kernel will allocate the memory for the ring. When the caller mmaps the ring, they will get back a virtual mapping to the buffer memory. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://patch.msgid.link/20260306003224.3620942-2-joannelkoong@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Allow kernel-managed buffers to be selected. This requires modifying the io_br_sel struct to separate the fields for address and val, since a kernel address cannot be distinguished from a negative val when error checking. Auto-commit any selected kernel-managed buffer. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://patch.msgid.link/20260306003224.3620942-3-joannelkoong@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add kernel APIs to pin and unpin buffer rings, preventing userspace from unregistering a buffer ring while it is pinned by the kernel. This provides a mechanism for kernel subsystems to safely access buffer ring contents while ensuring the buffer ring remains valid. A pinned buffer ring cannot be unregistered until explicitly unpinned. On the userspace side, trying to unregister a pinned buffer will return -EBUSY. This is a preparatory change for upcoming fuse usage of kernel-managed buffer rings. It is necessary for fuse to pin the buffer ring because fuse may need to select a buffer in atomic contexts, which it can only do so by using the underlying buffer list pointer. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://patch.msgid.link/20260306003224.3620942-4-joannelkoong@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Return the id of the selected buffer in io_buffer_select(). This is needed for kernel-managed buffer rings to later recycle the selected buffer. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://patch.msgid.link/20260306003224.3620942-5-joannelkoong@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add an interface for buffers to be recycled back into a kernel-managed buffer ring. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://patch.msgid.link/20260306003224.3620942-6-joannelkoong@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_uring_is_kmbuf_ring() returns true if there is a kernel-managed buffer ring at the specified buffer group. This is a preparatory patch for upcoming fuse kernel-managed buffer support, which needs to ensure the buffer ring registered by the server is a kernel-managed buffer ring. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://patch.msgid.link/20260306003224.3620942-7-joannelkoong@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Export io_ring_buffer_select() so that it may be used by callers who pass in a pinned bufring without needing to grab the io_uring mutex. This is a preparatory patch that will be needed by fuse io-uring, which will need to select a buffer from a kernel-managed bufring while the uring mutex may already be held by in-progress commits, and may need to select a buffer in atomic contexts. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://patch.msgid.link/20260306003224.3620942-8-joannelkoong@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
When uring_cmd operations select a buffer, the completion queue entry should indicate which buffer was selected. Set IORING_CQE_F_BUFFER on the completed entry and encode the buffer index if a buffer was selected. This change is needed in order to relay to userspace which selected buffer contains the data. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://patch.msgid.link/20260306003224.3620942-9-joannelkoong@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* for-7.1/io_uring: io_uring/cmd: set selected buffer index in __io_uring_cmd_done() io_uring/kbuf: export io_ring_buffer_select() io_uring/kbuf: add io_uring_is_kmbuf_ring() io_uring/kbuf: add recycling for kernel managed buffer rings io_uring/kbuf: return buffer id in buffer selection io_uring/kbuf: add buffer ring pinning/unpinning io_uring/kbuf: support kernel-managed buffer rings in buffer selection io_uring/kbuf: add support for kernel-managed buffer rings Signed-off-by: Jens Axboe <axboe@kernel.dk>
* for-7.1/block: block: partitions: Replace pp_buf with struct seq_buf
In our production environment, we have received multiple crash reports regarding libceph, which have caught our attention: ``` [6888366.280350] Call Trace: [6888366.280452] blk_update_request+0x14e/0x370 [6888366.280561] blk_mq_end_request+0x1a/0x130 [6888366.280671] rbd_img_handle_request+0x1a0/0x1b0 [rbd] [6888366.280792] rbd_obj_handle_request+0x32/0x40 [rbd] [6888366.280903] __complete_request+0x22/0x70 [libceph] [6888366.281032] osd_dispatch+0x15e/0xb40 [libceph] [6888366.281164] ? inet_recvmsg+0x5b/0xd0 [6888366.281272] ? ceph_tcp_recvmsg+0x6f/0xa0 [libceph] [6888366.281405] ceph_con_process_message+0x79/0x140 [libceph] [6888366.281534] ceph_con_v1_try_read+0x5d7/0xf30 [libceph] [6888366.281661] ceph_con_workfn+0x329/0x680 [libceph] ``` After analyzing the coredump file, we found that the address of dc->sb_bio has been freed. We know that cached_dev is only freed when it is stopped. Since sb_bio is a part of struct cached_dev, rather than an alloc every time. If the device is stopped while writing to the superblock, the released address will be accessed at endio. This patch hopes to wait for sb_write to complete in cached_dev_free. It should be noted that we analyzed the cause of the problem, then tell all details to the QWEN and adopted the modifications it made. Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn> Fixes: cafe563 ("bcache: A block layer cache") Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Coly Li <colyli@fnnas.com> Link: https://patch.msgid.link/20260322134102.480107-1-colyli@fnnas.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block-7.0: bcache: fix cached_dev.sb_bio use-after-free and crash
* for-7.1/block: md: remove unused mddev argument from export_rdev md/raid5: move handle_stripe() comment to correct location md/raid5: remove stale md_raid5_kick_device() declaration md/raid1: fix the comparing region of interval tree md/raid5: skip 2-failure compute when other disk is R5_LOCKED md/md-llbitmap: raise barrier before state machine transition md/md-llbitmap: skip reading rdevs that are not in_sync md/raid5: set chunk_sectors to enable full stripe I/O splitting md/raid10: fix deadlock with check operation and nowait requests md: suppress spurious superblock update error message for dm-raid
* for-7.1/block: ublk: move cold paths out of __ublk_batch_dispatch() for icache efficiency
* for-7.1/block: block: fix bio_alloc_bioset slowpath GFP handling
There are reports where io_uring instance removal takes too long and an ifq reallocation by another zcrx instance fails. Split zcrx destruction into two steps similarly how it was before, first close the queue early but maintain zcrx alive, and then when all inflight requests are completed, drop the main zcrx reference. For extra protection, mark terminated zcrx instances in xarray and warn if we double put them. Cc: stable@vger.kernel.org # 6.19+ Link: axboe/liburing#1550 Reported-by: Youngmin Choi <youngminchoi94@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://patch.msgid.link/0ce21f0565ab4358668922a28a8a36922dfebf76.1774261953.git.asml.silence@gmail.com [axboe: NULL ifq before break inside scoped guard] Signed-off-by: Jens Axboe <axboe@kernel.dk>
When accounting fails, io_import_umem() sets the page array, etc. and returns an error expecting that the error handling code will take care of the rest. To make the next patch simpler, only return a fully initialised areas from the function. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://patch.msgid.link/3a602b7fb347dbd4da6797ac49b52ea5dedb856d.1774261953.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
zcrx was originally establisihing dma mappings at a late stage when it was being bound to a page pool. Dma-buf couldn't work this way, so it's initialised during area creation. It's messy having them do it at different spots, just move everything to the area creation time. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://patch.msgid.link/334092a2cbdd4aabd7c025050aa99f05ace89bb5.1774261953.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
In preparation to following patches, add a function that is responsibly for looking up a netdev, creating an area, DMA mapping it and opening a queue. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://patch.msgid.link/88cb6f746ecb496a9030756125419df273d0b003.1774261953.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
Allow creating a zcrx instance without attaching it to a net device. All data will be copied through the fallback path. The user is also expected to use ZCRX_CTRL_FLUSH_RQ to handle overflows as it normally should even with a netdev, but it becomes even more relevant as there will likely be no one to automatically pick up buffers. Apart from that, it follows the zcrx uapi for the I/O path, and is useful for testing, experimentation, and potentially for the copy receive path in the future if improved. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://patch.msgid.link/674f8ad679c5a0bc79d538352b3042cf0999596e.1774261953.git.asml.silence@gmail.com [axboe: fix spelling error in uapi header and commit message] Signed-off-by: Jens Axboe <axboe@kernel.dk>
Rename "region" to "rq_region" to highlight that it's a refill queue region. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://patch.msgid.link/ac815790d2477a15826aecaa3d94f2a94ef507e6.1774261953.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* for-7.1/block: drbd: use genl pre_doit/post_doit
* io_uring-7.0: io_uring/fdinfo: fix SQE_MIXED SQE displaying
* for-7.1/block: drbd: Balance RCU calls in drbd_adm_dump_devices()
* io_uring-7.0: io_uring/fdinfo: fix OOB read in SQE_MIXED wrap check
* for-7.1/block: (41 commits) nvme-loop: do not cancel I/O and admin tagset during ctrl reset/shutdown nvme: add WQ_PERCPU to alloc_workqueue users nvmet-fc: add WQ_PERCPU to alloc_workqueue users nvmet: replace use of system_wq with system_percpu_wq nvme-auth: Don't propose NVME_AUTH_DHGROUP_NULL with SC_C nvme: Add the DHCHAP maximum HD IDs nvme-pci: add NVME_QUIRK_DISABLE_WRITE_ZEROES for Kingston OM3SGP4 nvme: respect NVME_QUIRK_DISABLE_WRITE_ZEROES when wzsl is set nvmet: report NPDGL and NPDAL nvmet: use NVME_NS_FEAT_OPTPERF_SHIFT nvme: set discard_granularity from NPDG/NPDA nvme: add from0based() helper nvme: always issue I/O Command Set specific Identify Namespace nvme: update nvme_id_ns OPTPERF constants nvme: fold nvme_config_discard() into nvme_update_disk_info() nvme: add preferred I/O size fields to struct nvme_id_ns_nvm nvme: Allow reauth from sysfs nvme: Expose the tls_configured sysfs for secure concat connections nvmet-tcp: Don't free SQ on authentication success nvmet-tcp: Don't error if TLS is enabed on a reset ...
* io_uring-7.0: io_uring/rsrc: reject zero-length fixed buffer import io_uring/net: fix slab-out-of-bounds read in io_bundle_nbufs()
* for-7.1/block: block: fix zones_cond memory leak on zone revalidation error paths loop: fix partition scan race between udev and loop_reread_partitions() sed-opal: Add STACK_RESET command
Commit 9618908 addressed one case of ctx->rings being potentially accessed while a resize is happening on the ring, but there are still a few others that need handling. Add a helper for retrieving the rings associated with an io_uring context, and add some sanity checking to that to catch bad uses. ->rings_rcu is always valid, as long as it's used within RCU read lock. Any use of ->rings_rcu or ->rings inside either ->uring_lock or ->completion_lock is sane as well. Do the minimum fix for the current kernel, but set it up such that this basic infra can be extended for later kernels to make this harder to mess up in the future. Thanks to Junxi Qian for finding and debugging this issue. Cc: stable@vger.kernel.org Fixes: 79cfe9e ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS") Reviewed-by: Junxi Qian <qjx1298677004@gmail.com> Tested-by: Junxi Qian <qjx1298677004@gmail.com> Link: https://lore.kernel.org/io-uring/20260330172348.89416-1-qjx1298677004@gmail.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring-7.0: io_uring: protect remaining lockless ctx->rings accesses with RCU
Replace kfree(node) with io_cache_free() in io_buffer_register_bvec() to match all other error paths that free nodes allocated via io_rsrc_node_alloc(). The node is allocated through io_cache_alloc() internally, so it should be returned to the cache via io_cache_free() for proper object reuse. Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Link: https://patch.msgid.link/20260331104509.7055-1-liu.yun@linux.dev [axboe: remove fixes tag, it's not a fix, it's a cleanup] Signed-off-by: Jens Axboe <axboe@kernel.dk>
* for-7.1/io_uring: io_uring/rsrc: use io_cache_free() to free node
* for-7.1/block: zloop: add max_open_zones option
If io_parse_restrictions() fails, it ends up clearing any restrictions currently set. The intent is only to clear whatever it already applied, but it ends up clearing everything, including whatever settings may have been applied in a copy-on-write fashion already. Ensure that those are retained. Link: https://lore.kernel.org/io-uring/CAK8a0jzF-zaO5ZmdOrmfuxrhXuKg5m5+RDuO7tNvtj=kUYbW7Q@mail.gmail.com/ Reported-by: antonius <bluedragonsec2023@gmail.com> Fixes: ed82f35 ("io_uring: allow registration of per-task restrictions") Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring-7.0: io_uring/bpf_filters: retain COW'ed settings on parse failures
* for-7.1/block: blk-cgroup: fix disk reference leak in blkcg_maybe_throttle_current()
* for-7.1/block: blk-iocost: fix busy_level reset when no IOs complete
Author
|
Upstream branch: 132ba7a Pull request is NOT updated. Failed to apply https://patchwork.kernel.org/project/linux-block/list/?series=1076130 conflict: |
841c373 to
69199d7
Compare
69199d7 to
33cdc91
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull request for series with
subject: block: fix resource leak in blk_register_queue() error path
version: 1
url: https://patchwork.kernel.org/project/linux-block/list/?series=1076130