-
Notifications
You must be signed in to change notification settings - Fork 234
Description
When compiling with both gcc 15.2.1 and clang 22.1.1 on 6.19.7-1-cachyos the example.io_uring executable hangs.
Steps to reproduce:
- cmake -S . -B build -DSTDEXEC_ENABLE_IO_URING=ON -GNinja
- cmake --build build
- build/examples/example.io_uring
TLDR
The initialization of __m in __umin is incorrect here: https://github.com/NVIDIA/stdexec/blob/main/include/stdexec/__detail/__utility.hpp#L129
Since std::size_t is unsigned, the check on line 132 will never be true and __m will never be updated so this function always returns 0.
This function only appears to be used in one spot detailed below which explains why this issue has likely gone unnoticed. The following diff fixes the issue:
❯ git diff include/stdexec/__detail/__utility.hpp
diff --git a/include/stdexec/__detail/__utility.hpp b/include/stdexec/__detail/__utility.hpp
index f1937319..66d58ff0 100644
--- a/include/stdexec/__detail/__utility.hpp
+++ b/include/stdexec/__detail/__utility.hpp
@@ -22,6 +22,7 @@
#include <cstdarg>
#include <cstdio>
#include <initializer_list>
+#include <limits>
#include <memory> // IWYU pragma: keep for std::start_lifetime_as
#include <new> // IWYU pragma: keep for std::launder
#include <utility> // IWYU pragma: keep for std::unreachable
@@ -126,7 +127,7 @@ namespace STDEXEC
inline constexpr auto __umin(std::initializer_list<std::size_t> __il) noexcept -> std::size_t
{
- std::size_t __m = 0;
+ std::size_t __m = std::numeric_limits<std::size_t>::max();
for (std::size_t __i: __il)
{
if (__i < __m)
Details
I understand io_uring support is experimental but I decided to debug a bit. The hang was happening on the first call stdexec::sync_wait here https://github.com/NVIDIA/stdexec/blob/main/examples/io_uring.cpp#L40. A simple strace shows the process hanging on a call to futex. It also shows that both contexts are having the rings setup with io_uring_setup but io_uring_enter is never called indicating no operations are being executed by either context.
❯ strace build/examples/example.io_uring 2>&1 |grep io_uring
execve("build/examples/example.io_uring", ["build/examples/example.io_uring"], 0x7ffe251ad100 /* 76 vars */) = 0
io_uring_setup(1024, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=1024, cq_entries=2048, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|IORING_FEAT_SQPOLL_NONFIXED|IORING_FEAT_EXT_ARG|IORING_FEAT_NATIVE_WORKERS|IORING_FEAT_RSRC_TAGS|IORING_FEAT_CQE_SKIP|IORING_FEAT_LINKED_FILE|IORING_FEAT_REG_REG_RING|IORING_FEAT_RECVSEND_BUNDLE|IORING_FEAT_MIN_TIMEOUT|IORING_FEAT_RW_ATTR|IORING_FEAT_NO_IOWAIT, sq_off={head=0, tail=4, ring_mask=16, ring_entries=24, flags=36, dropped=32, array=32832, user_addr=0}, cq_off={head=8, tail=12, ring_mask=20, ring_entries=28, overflow=44, cqes=64, flags=40, user_addr=0}}) = 3
io_uring_setup(1024, {flags=0, sq_thread_cpu=0, sq_thread_idle=0, sq_entries=1024, cq_entries=2048, features=IORING_FEAT_SINGLE_MMAP|IORING_FEAT_NODROP|IORING_FEAT_SUBMIT_STABLE|IORING_FEAT_RW_CUR_POS|IORING_FEAT_CUR_PERSONALITY|IORING_FEAT_FAST_POLL|IORING_FEAT_POLL_32BITS|IORING_FEAT_SQPOLL_NONFIXED|IORING_FEAT_EXT_ARG|IORING_FEAT_NATIVE_WORKERS|IORING_FEAT_RSRC_TAGS|IORING_FEAT_CQE_SKIP|IORING_FEAT_LINKED_FILE|IORING_FEAT_REG_REG_RING|IORING_FEAT_RECVSEND_BUNDLE|IORING_FEAT_MIN_TIMEOUT|IORING_FEAT_RW_ATTR|IORING_FEAT_NO_IOWAIT, sq_off={head=0, tail=4, ring_mask=16, ring_entries=24, flags=36, dropped=32, array=32832, user_addr=0}, cq_off={head=8, tail=12, ring_mask=20, ring_entries=28, overflow=44, cqes=64, flags=40, user_addr=0}}) = 5
I added some local debug logging to run_until_stopped() and saw it was breaking out of the main loop here: https://github.com/NVIDIA/stdexec/blob/main/include/exec/linux/io_uring_context.hpp#L576 before any operations where submitted to the kernel. This led me to delve into run_some() where __n_total_submitted is updated from the result of __submission_queue::submit where the __max_submissions gets capped to 0 through the use of __umin here: https://github.com/NVIDIA/stdexec/blob/main/include/exec/linux/io_uring_context.hpp#L243 so even though the initial wakeup operation to keep the mainloop going waiting for work is started here: https://github.com/NVIDIA/stdexec/blob/main/include/exec/linux/io_uring_context.hpp#L562 it's never submitted to the kernel and the run loop breaks out before any of the timer based operations materialized by exec::schedule_after inside main() can be submitted to the kernel as well.
Note that the unit tests also appear to have a similar hang which is addressed by the above fix. There is also a namespace ambiguity issue in those tests that needs to be addressed on newer compilers:
❯ git diff test/exec/test_io_uring_context.cpp
diff --git a/test/exec/test_io_uring_context.cpp b/test/exec/test_io_uring_context.cpp
index 27e75267..dc931ea5 100644
--- a/test/exec/test_io_uring_context.cpp
+++ b/test/exec/test_io_uring_context.cpp
@@ -26,6 +26,7 @@
# include "exec/scope.hpp"
# include "exec/single_thread_context.hpp"
# include "exec/when_any.hpp"
+# include "exec/start_detached.hpp"
# include "catch2/catch.hpp"
@@ -119,7 +120,7 @@ namespace
io_uring_context context;
io_uring_scheduler scheduler = context.get_scheduler();
bool is_called = false;
- start_detached(schedule(scheduler)
+ exec::start_detached(schedule(scheduler)
| then(
[&]
{
@@ -206,7 +207,7 @@ namespace
io_uring_scheduler scheduler = context.get_scheduler();
{
bool is_called = false;
- start_detached(schedule(scheduler)
+ exec::start_detached(schedule(scheduler)
| then(
[&]
{