blk: honor isolcpus configuration#703
blk: honor isolcpus configuration#703blktests-ci[bot] wants to merge 13 commits intolinus-master_basefrom
Conversation
|
Upstream branch: 7aaa804 |
c781e1e to
255b4bf
Compare
|
Upstream branch: d0c3bcd |
c92cfc1 to
3d12627
Compare
255b4bf to
3236861
Compare
|
Upstream branch: 9147566 |
3d12627 to
f86956b
Compare
|
Upstream branch: 9147566 |
f86956b to
15d5ec4
Compare
ecbdbb4 to
480b162
Compare
|
Upstream branch: 9147566 |
15d5ec4 to
956910e
Compare
|
Upstream branch: 9147566 |
956910e to
3ae9bdf
Compare
a96fba7 to
0cd6ac2
Compare
|
Upstream branch: 9147566 |
3ae9bdf to
df16bca
Compare
0cd6ac2 to
910d344
Compare
|
Upstream branch: d8a9a4b |
df16bca to
98d6d83
Compare
910d344 to
ed862bc
Compare
|
Upstream branch: 7ca6d1c |
98d6d83 to
885f728
Compare
ed862bc to
2d0c3d5
Compare
|
Upstream branch: 3aae938 |
885f728 to
7c42fb2
Compare
2d0c3d5 to
931d9b0
Compare
|
Upstream branch: 3036cd0 |
7c42fb2 to
591229c
Compare
931d9b0 to
78a4682
Compare
The calculation of the upper limit for queues does not depend solely on the number of online CPUs; for example, the isolcpus kernel command-line option must also be considered. To account for this, the block layer provides a helper function to retrieve the maximum number of queues. Use it to set an appropriate upper queue number limit. Fixes: 94970cf ("scsi: use block layer helpers to calculate num of queues") Signed-off-by: Daniel Wagner <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Aaron Tomlin <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]>
The support for the !SMP configuration has been removed from the core by commit cac5cef ("sched/smp: Make SMP unconditional"). While one can technically still compile a uniprocessor kernel, the core scheduler now mandates SMP unconditionally, rendering this particular !SMP fallback handling redundant. Therefore, remove the #ifdef CONFIG_SMP guards and the fallback logic. Signed-off-by: Daniel Wagner <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> [atomlin: Updated commit message to clarify !SMP removal context] Signed-off-by: Aaron Tomlin <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]>
group_mask_cpu_evenly() allows the caller to pass in a CPU mask that should be evenly distributed. This new function is a more generic version of the existing group_cpus_evenly(), which always distributes all present CPUs into groups. Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Daniel Wagner <[email protected]> Signed-off-by: Aaron Tomlin <[email protected]>
Pass a cpumask to irq_create_affinity_masks as an additional constraint to consider when creating the affinity masks. This allows the caller to exclude specific CPUs, e.g., isolated CPUs (see the 'isolcpus' kernel command-line parameter). Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Daniel Wagner <[email protected]> Signed-off-by: Aaron Tomlin <[email protected]>
Introduce blk_mq_{online|possible}_queue_affinity, which returns the
queue-to-CPU mapping constraints defined by the block layer. This allows
other subsystems (e.g., IRQ affinity setup) to respect block layer
requirements.
It is necessary to provide versions for both the online and possible CPU
masks because some drivers want to spread their I/O queues only across
online CPUs, while others prefer to use all possible CPUs. And the mask
used needs to match with the number of queues requested
(see blk_num_{online|possible}_queues).
Reviewed-by: Hannes Reinecke <[email protected]>
Signed-off-by: Daniel Wagner <[email protected]>
Signed-off-by: Aaron Tomlin <[email protected]>
Ensure that IRQ affinity setup also respects the queue-to-CPU mapping constraints provided by the block layer. This allows the NVMe driver to avoid assigning interrupts to CPUs that the block layer has excluded (e.g., isolated CPUs). Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Daniel Wagner <[email protected]> Signed-off-by: Aaron Tomlin <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]>
Ensure that IRQ affinity setup also respects the queue-to-CPU mapping constraints provided by the block layer. This allows the SCSI drivers to avoid assigning interrupts to CPUs that the block layer has excluded (e.g., isolated CPUs). Only convert drivers which are already using the pci_alloc_irq_vectors_affinity with the PCI_IRQ_AFFINITY flag set. Because these drivers are enabled to let the IRQ core code to set the affinity. Also don't update qla2xxx because the nvme-fabrics code is not ready yet. Signed-off-by: Daniel Wagner <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Aaron Tomlin <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]>
Ensure that IRQ affinity setup also respects the queue-to-CPU mapping constraints provided by the block layer. This allows the virtio drivers to avoid assigning interrupts to CPUs that the block layer has excluded (e.g., isolated CPUs). Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Daniel Wagner <[email protected]> Signed-off-by: Aaron Tomlin <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]>
Multiqueue drivers spread I/O queues across all CPUs for optimal performance. However, these drivers are not aware of CPU isolation requirements and will distribute queues without considering the isolcpus configuration. Introduce a new isolcpus mask that allows users to define which CPUs should have I/O queues assigned. This is similar to managed_irq, but intended for drivers that do not use the managed IRQ infrastructure Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Aaron Tomlin <[email protected]> Signed-off-by: Daniel Wagner <[email protected]> Signed-off-by: Aaron Tomlin <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]>
Extend the capabilities of the generic CPU to hardware queue (hctx)
mapping code, so it maps houskeeping CPUs and isolated CPUs to the
hardware queues evenly.
A hctx is only operational when there is at least one online
housekeeping CPU assigned (aka active_hctx). Thus, check the final
mapping that there is no hctx which has only offline housekeeing CPU and
online isolated CPUs.
Example mapping result:
16 online CPUs
isolcpus=io_queue,2-3,6-7,12-13
Queue mapping:
hctx0: default 0 2
hctx1: default 1 3
hctx2: default 4 6
hctx3: default 5 7
hctx4: default 8 12
hctx5: default 9 13
hctx6: default 10
hctx7: default 11
hctx8: default 14
hctx9: default 15
IRQ mapping:
irq 42 affinity 0 effective 0 nvme0q0
irq 43 affinity 0 effective 0 nvme0q1
irq 44 affinity 1 effective 1 nvme0q2
irq 45 affinity 4 effective 4 nvme0q3
irq 46 affinity 5 effective 5 nvme0q4
irq 47 affinity 8 effective 8 nvme0q5
irq 48 affinity 9 effective 9 nvme0q6
irq 49 affinity 10 effective 10 nvme0q7
irq 50 affinity 11 effective 11 nvme0q8
irq 51 affinity 14 effective 14 nvme0q9
irq 52 affinity 15 effective 15 nvme0q10
A corner case is when the number of online CPUs and present CPUs
differ and the driver asks for less queues than online CPUs, e.g.
8 online CPUs, 16 possible CPUs
isolcpus=io_queue,2-3,6-7,12-13
virtio_blk.num_request_queues=2
Queue mapping:
hctx0: default 0 1 2 3 4 5 6 7 8 12 13
hctx1: default 9 10 11 14 15
IRQ mapping
irq 27 affinity 0 effective 0 virtio0-config
irq 28 affinity 0-1,4-5,8 effective 5 virtio0-req.0
irq 29 affinity 9-11,14-15 effective 0 virtio0-req.1
Noteworthy is that for the normal/default configuration (!isoclpus) the
mapping will change for systems which have non hyperthreading CPUs. The
main assignment loop will completely rely that group_mask_cpus_evenly to
do the right thing. The old code would distribute the CPUs linearly over
the hardware context:
queue mapping for /dev/nvme0n1
hctx0: default 0 8
hctx1: default 1 9
hctx2: default 2 10
hctx3: default 3 11
hctx4: default 4 12
hctx5: default 5 13
hctx6: default 6 14
hctx7: default 7 15
The assign each hardware context the map generated by the
group_mask_cpus_evenly function:
queue mapping for /dev/nvme0n1
hctx0: default 0 1
hctx1: default 2 3
hctx2: default 4 5
hctx3: default 6 7
hctx4: default 8 9
hctx5: default 10 11
hctx6: default 12 13
hctx7: default 14 15
In case of hyperthreading CPUs, the resulting map stays the same.
Signed-off-by: Daniel Wagner <[email protected]>
[atomlin: Fixed absolute vs. relative hardware queue index mix-up in
blk_mq_map_queues and validation checks; fixed typographical errors.]
Signed-off-by: Aaron Tomlin <[email protected]>
When isolcpus=io_queue is enabled, and the last housekeeping CPU for a given hctx goes offline, there would be no CPU left to handle I/O. To prevent I/O stalls, prevent offlining housekeeping CPUs that are still serving isolated CPUs. When isolcpus=io_queue is enabled and the last housekeeping CPU for a given hctx goes offline, no CPU would be left to handle I/O. To prevent I/O stalls, disallow offlining housekeeping CPUs that are still serving isolated CPUs. Reviewed-by: Aaron Tomlin <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Daniel Wagner <[email protected]> Signed-off-by: Aaron Tomlin <[email protected]>
At present, the managed interrupt spreading algorithm distributes vectors
across all available CPUs within a given node or system. On systems
employing CPU isolation (e.g., "isolcpus=io_queue"), this behaviour
defeats the primary purpose of isolation by routing hardware interrupts
(such as NVMe completion queues) directly to isolated cores.
Update irq_create_affinity_masks() to respect the housekeeping CPU mask.
Introduce irq_spread_hk_filter() to intersect the natively calculated
affinity mask with the HK_TYPE_IO_QUEUE mask, thereby keeping managed
interrupts off isolated CPUs.
To ensure strict isolation whilst guaranteeing a valid routing destination:
1. Fallback mechanism: Should the initial spreading logic assign a
vector exclusively to isolated CPUs (resulting in an empty
intersection), the filter safely falls back to the system's
online housekeeping CPUs.
2. Hotplug safety: The fallback utilises data_race(cpu_online_mask)
instead of allocating a local cpumask snapshot. This circumvents
CONFIG_CPUMASK_OFFSTACK stack bloat hazards on high-core-count
systems. Furthermore, it prevents deadlocks with concurrent CPU
hotplug operations (e.g., during storage driver error recovery)
by eliminating the need to hold the CPU hotplug read lock.
3. Fast-path optimisation: The filtering logic is conditionally
executed only if housekeeping is enabled, thereby ensuring zero
overhead for standard configurations.
Signed-off-by: Aaron Tomlin <[email protected]>
The io_queue flag informs multiqueue device drivers where to place hardware queues. Document this new flag in the isolcpus command-line argument description. Reviewed-by: Aaron Tomlin <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Daniel Wagner <[email protected]> Signed-off-by: Aaron Tomlin <[email protected]>
|
Upstream branch: 9a9c8ce |
591229c to
95bbb66
Compare
Pull request for series with
subject: blk: honor isolcpus configuration
version: 9
url: https://patchwork.kernel.org/project/linux-block/list/?series=1074832