Skip to content

blk: honor isolcpus configuration#703

Open
blktests-ci[bot] wants to merge 13 commits intolinus-master_basefrom
series/1074832=>linus-master
Open

blk: honor isolcpus configuration#703
blktests-ci[bot] wants to merge 13 commits intolinus-master_basefrom
series/1074832=>linus-master

Conversation

@blktests-ci
Copy link
Copy Markdown

@blktests-ci blktests-ci bot commented Mar 30, 2026

Pull request for series with
subject: blk: honor isolcpus configuration
version: 9
url: https://patchwork.kernel.org/project/linux-block/list/?series=1074832

@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci bot commented Mar 30, 2026

Upstream branch: 7aaa804
series: https://patchwork.kernel.org/project/linux-block/list/?series=1074832
version: 9

@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci bot commented Mar 31, 2026

Upstream branch: d0c3bcd
series: https://patchwork.kernel.org/project/linux-block/list/?series=1074832
version: 9

@blktests-ci blktests-ci bot force-pushed the series/1074832=>linus-master branch from c92cfc1 to 3d12627 Compare March 31, 2026 12:56
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from 255b4bf to 3236861 Compare April 1, 2026 02:20
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci bot commented Apr 1, 2026

Upstream branch: 9147566
series: https://patchwork.kernel.org/project/linux-block/list/?series=1074832
version: 9

@blktests-ci blktests-ci bot force-pushed the series/1074832=>linus-master branch from 3d12627 to f86956b Compare April 1, 2026 02:22
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci bot commented Apr 1, 2026

Upstream branch: 9147566
series: https://patchwork.kernel.org/project/linux-block/list/?series=1076079
version: 10

@blktests-ci blktests-ci bot added V10 and removed V9 V9-ci-fail labels Apr 1, 2026
@blktests-ci blktests-ci bot force-pushed the series/1074832=>linus-master branch from f86956b to 15d5ec4 Compare April 1, 2026 22:34
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch 2 times, most recently from ecbdbb4 to 480b162 Compare April 2, 2026 07:52
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci bot commented Apr 2, 2026

Upstream branch: 9147566
series: https://patchwork.kernel.org/project/linux-block/list/?series=1076079
version: 10

@blktests-ci blktests-ci bot force-pushed the series/1074832=>linus-master branch from 15d5ec4 to 956910e Compare April 2, 2026 07:52
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci bot commented Apr 3, 2026

Upstream branch: 9147566
series: https://patchwork.kernel.org/project/linux-block/list/?series=1076079
version: 10

@blktests-ci blktests-ci bot force-pushed the series/1074832=>linus-master branch from 956910e to 3ae9bdf Compare April 3, 2026 01:57
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch 2 times, most recently from a96fba7 to 0cd6ac2 Compare April 3, 2026 07:51
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci bot commented Apr 3, 2026

Upstream branch: 9147566
series: https://patchwork.kernel.org/project/linux-block/list/?series=1076079
version: 10

@blktests-ci blktests-ci bot force-pushed the series/1074832=>linus-master branch from 3ae9bdf to df16bca Compare April 3, 2026 07:52
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from 0cd6ac2 to 910d344 Compare April 3, 2026 11:53
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci bot commented Apr 3, 2026

Upstream branch: d8a9a4b
series: https://patchwork.kernel.org/project/linux-block/list/?series=1076079
version: 10

@blktests-ci blktests-ci bot force-pushed the series/1074832=>linus-master branch from df16bca to 98d6d83 Compare April 3, 2026 11:55
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from 910d344 to ed862bc Compare April 4, 2026 05:34
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci bot commented Apr 4, 2026

Upstream branch: 7ca6d1c
series: https://patchwork.kernel.org/project/linux-block/list/?series=1076079
version: 10

@blktests-ci blktests-ci bot force-pushed the series/1074832=>linus-master branch from 98d6d83 to 885f728 Compare April 4, 2026 05:36
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from ed862bc to 2d0c3d5 Compare April 4, 2026 16:55
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci bot commented Apr 4, 2026

Upstream branch: 3aae938
series: https://patchwork.kernel.org/project/linux-block/list/?series=1076079
version: 10

@blktests-ci blktests-ci bot force-pushed the series/1074832=>linus-master branch from 885f728 to 7c42fb2 Compare April 4, 2026 16:57
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from 2d0c3d5 to 931d9b0 Compare April 8, 2026 10:29
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci bot commented Apr 8, 2026

Upstream branch: 3036cd0
series: https://patchwork.kernel.org/project/linux-block/list/?series=1076079
version: 10

@blktests-ci blktests-ci bot force-pushed the series/1074832=>linus-master branch from 7c42fb2 to 591229c Compare April 8, 2026 10:38
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from 931d9b0 to 78a4682 Compare April 10, 2026 00:51
igaw and others added 13 commits April 10, 2026 09:59
The calculation of the upper limit for queues does not depend solely on
the number of online CPUs; for example, the isolcpus kernel
command-line option must also be considered.

To account for this, the block layer provides a helper function to
retrieve the maximum number of queues. Use it to set an appropriate
upper queue number limit.

Fixes: 94970cf ("scsi: use block layer helpers to calculate num of queues")
Signed-off-by: Daniel Wagner <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Aaron Tomlin <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
The support for the !SMP configuration has been removed from the core by
commit cac5cef ("sched/smp: Make SMP unconditional").

While one can technically still compile a uniprocessor kernel, the core
scheduler now mandates SMP unconditionally, rendering this particular
!SMP fallback handling redundant. Therefore, remove the #ifdef CONFIG_SMP
guards and the fallback logic.

Signed-off-by: Daniel Wagner <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
[atomlin: Updated commit message to clarify !SMP removal context]
Signed-off-by: Aaron Tomlin <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
group_mask_cpu_evenly() allows the caller to pass in a CPU mask that
should be evenly distributed. This new function is a more generic
version of the existing group_cpus_evenly(), which always distributes
all present CPUs into groups.

Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
Signed-off-by: Aaron Tomlin <[email protected]>
Pass a cpumask to irq_create_affinity_masks as an additional constraint
to consider when creating the affinity masks. This allows the caller to
exclude specific CPUs, e.g., isolated CPUs (see the 'isolcpus' kernel
command-line parameter).

Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
Signed-off-by: Aaron Tomlin <[email protected]>
Introduce blk_mq_{online|possible}_queue_affinity, which returns the
queue-to-CPU mapping constraints defined by the block layer. This allows
other subsystems (e.g., IRQ affinity setup) to respect block layer
requirements.

It is necessary to provide versions for both the online and possible CPU
masks because some drivers want to spread their I/O queues only across
online CPUs, while others prefer to use all possible CPUs. And the mask
used needs to match with the number of queues requested
(see blk_num_{online|possible}_queues).

Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
Signed-off-by: Aaron Tomlin <[email protected]>
Ensure that IRQ affinity setup also respects the queue-to-CPU mapping
constraints provided by the block layer. This allows the NVMe driver
to avoid assigning interrupts to CPUs that the block layer has excluded
(e.g., isolated CPUs).

Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
Signed-off-by: Aaron Tomlin <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Ensure that IRQ affinity setup also respects the queue-to-CPU mapping
constraints provided by the block layer. This allows the SCSI drivers
to avoid assigning interrupts to CPUs that the block layer has excluded
(e.g., isolated CPUs).

Only convert drivers which are already using the
pci_alloc_irq_vectors_affinity with the PCI_IRQ_AFFINITY flag set.
Because these drivers are enabled to let the IRQ core code to
set the affinity. Also don't update qla2xxx because the nvme-fabrics
code is not ready yet.

Signed-off-by: Daniel Wagner <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Aaron Tomlin <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Ensure that IRQ affinity setup also respects the queue-to-CPU mapping
constraints provided by the block layer. This allows the virtio drivers
to avoid assigning interrupts to CPUs that the block layer has excluded
(e.g., isolated CPUs).

Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
Signed-off-by: Aaron Tomlin <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Multiqueue drivers spread I/O queues across all CPUs for optimal
performance. However, these drivers are not aware of CPU isolation
requirements and will distribute queues without considering the isolcpus
configuration.

Introduce a new isolcpus mask that allows users to define which CPUs
should have I/O queues assigned. This is similar to managed_irq, but
intended for drivers that do not use the managed IRQ infrastructure

Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Aaron Tomlin <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
Signed-off-by: Aaron Tomlin <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Extend the capabilities of the generic CPU to hardware queue (hctx)
mapping code, so it maps houskeeping CPUs and isolated CPUs to the
hardware queues evenly.

A hctx is only operational when there is at least one online
housekeeping CPU assigned (aka active_hctx). Thus, check the final
mapping that there is no hctx which has only offline housekeeing CPU and
online isolated CPUs.

Example mapping result:

  16 online CPUs

  isolcpus=io_queue,2-3,6-7,12-13

Queue mapping:
        hctx0: default 0 2
        hctx1: default 1 3
        hctx2: default 4 6
        hctx3: default 5 7
        hctx4: default 8 12
        hctx5: default 9 13
        hctx6: default 10
        hctx7: default 11
        hctx8: default 14
        hctx9: default 15

IRQ mapping:
        irq 42 affinity 0 effective 0  nvme0q0
        irq 43 affinity 0 effective 0  nvme0q1
        irq 44 affinity 1 effective 1  nvme0q2
        irq 45 affinity 4 effective 4  nvme0q3
        irq 46 affinity 5 effective 5  nvme0q4
        irq 47 affinity 8 effective 8  nvme0q5
        irq 48 affinity 9 effective 9  nvme0q6
        irq 49 affinity 10 effective 10  nvme0q7
        irq 50 affinity 11 effective 11  nvme0q8
        irq 51 affinity 14 effective 14  nvme0q9
        irq 52 affinity 15 effective 15  nvme0q10

A corner case is when the number of online CPUs and present CPUs
differ and the driver asks for less queues than online CPUs, e.g.

  8 online CPUs, 16 possible CPUs

  isolcpus=io_queue,2-3,6-7,12-13
  virtio_blk.num_request_queues=2

Queue mapping:
        hctx0: default 0 1 2 3 4 5 6 7 8 12 13
        hctx1: default 9 10 11 14 15

IRQ mapping
        irq 27 affinity 0 effective 0 virtio0-config
        irq 28 affinity 0-1,4-5,8 effective 5 virtio0-req.0
        irq 29 affinity 9-11,14-15 effective 0 virtio0-req.1

Noteworthy is that for the normal/default configuration (!isoclpus) the
mapping will change for systems which have non hyperthreading CPUs. The
main assignment loop will completely rely that group_mask_cpus_evenly to
do the right thing. The old code would distribute the CPUs linearly over
the hardware context:

queue mapping for /dev/nvme0n1
        hctx0: default 0 8
        hctx1: default 1 9
        hctx2: default 2 10
        hctx3: default 3 11
        hctx4: default 4 12
        hctx5: default 5 13
        hctx6: default 6 14
        hctx7: default 7 15

The assign each hardware context the map generated by the
group_mask_cpus_evenly function:

queue mapping for /dev/nvme0n1
        hctx0: default 0 1
        hctx1: default 2 3
        hctx2: default 4 5
        hctx3: default 6 7
        hctx4: default 8 9
        hctx5: default 10 11
        hctx6: default 12 13
        hctx7: default 14 15

In case of hyperthreading CPUs, the resulting map stays the same.

Signed-off-by: Daniel Wagner <[email protected]>
[atomlin: Fixed absolute vs. relative hardware queue index mix-up in
 blk_mq_map_queues and validation checks; fixed typographical errors.]
Signed-off-by: Aaron Tomlin <[email protected]>
When isolcpus=io_queue is enabled, and the last housekeeping CPU for a
given hctx goes offline, there would be no CPU left to handle I/O. To
prevent I/O stalls, prevent offlining housekeeping CPUs that are still
serving isolated CPUs.

When isolcpus=io_queue is enabled and the last housekeeping CPU
for a given hctx goes offline, no CPU would be left to handle I/O.
To prevent I/O stalls, disallow offlining housekeeping CPUs that are
still serving isolated CPUs.

Reviewed-by: Aaron Tomlin <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
Signed-off-by: Aaron Tomlin <[email protected]>
At present, the managed interrupt spreading algorithm distributes vectors
across all available CPUs within a given node or system. On systems
employing CPU isolation (e.g., "isolcpus=io_queue"), this behaviour
defeats the primary purpose of isolation by routing hardware interrupts
(such as NVMe completion queues) directly to isolated cores.

Update irq_create_affinity_masks() to respect the housekeeping CPU mask.
Introduce irq_spread_hk_filter() to intersect the natively calculated
affinity mask with the HK_TYPE_IO_QUEUE mask, thereby keeping managed
interrupts off isolated CPUs.

To ensure strict isolation whilst guaranteeing a valid routing destination:

    1.  Fallback mechanism: Should the initial spreading logic assign a
        vector exclusively to isolated CPUs (resulting in an empty
        intersection), the filter safely falls back to the system's
        online housekeeping CPUs.

    2.  Hotplug safety: The fallback utilises data_race(cpu_online_mask)
        instead of allocating a local cpumask snapshot. This circumvents
        CONFIG_CPUMASK_OFFSTACK stack bloat hazards on high-core-count
        systems. Furthermore, it prevents deadlocks with concurrent CPU
        hotplug operations (e.g., during storage driver error recovery)
        by eliminating the need to hold the CPU hotplug read lock.

    3.  Fast-path optimisation: The filtering logic is conditionally
        executed only if housekeeping is enabled, thereby ensuring zero
        overhead for standard configurations.

Signed-off-by: Aaron Tomlin <[email protected]>
The io_queue flag informs multiqueue device drivers where to place
hardware queues. Document this new flag in the isolcpus
command-line argument description.

Reviewed-by: Aaron Tomlin <[email protected]>
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
Signed-off-by: Aaron Tomlin <[email protected]>
@blktests-ci
Copy link
Copy Markdown
Author

blktests-ci bot commented Apr 10, 2026

Upstream branch: 9a9c8ce
series: https://patchwork.kernel.org/project/linux-block/list/?series=1076079
version: 10

@blktests-ci blktests-ci bot force-pushed the series/1074832=>linus-master branch from 591229c to 95bbb66 Compare April 10, 2026 00:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants