Skip to content

clone3 is not implemented — thread creation fails on modern-glibc binaries #1638

Description

@retrocpugeek

clone3 is not implemented — thread creation fails on modern-glibc binaries

Describe the bug

clone3 (the modern variant of clone, syscall 435 on most arches, 5435 on MIPS n64) has no handler in Qiling. It appears in the number→name maps (qiling/os/linux/map_syscall.py) but there is no ql_syscall_clone3:

$ grep -rn "def ql_syscall_clone" qiling/
qiling/os/posix/syscall/sched.py:13:def ql_syscall_clone(...)   # only the legacy clone

Recent glibc (e.g. the toolchains shipping with Ubuntu 24.04 / glibc 2.39) issues clone3 from pthread_create, falling back to the legacy clone only if clone3 returns -ENOSYS. So on any guest built against a recent glibc, pthread_create hits the unimplemented clone3 and thread creation fails:

[!] 0x...: syscall ql_syscall_clone3 number = 0x153b(5435) not implemented
Create pthread error!

Crucially, the clone3clone fallback never kicks in, because Qiling does not return -ENOSYS for unimplemented syscallsQlOsPosix just logs a warning and leaves the return register untouched (qiling/os/posix/posix.py):

self.ql.log.warning(f'... syscall {syscall_name} ... not implemented')

glibc only falls back on a literal -ENOSYS, so it sees a garbage return value, assumes clone3 "worked" (or failed for a real reason), and gives up instead of retrying with clone.

Repro

Any multithreaded binary built with a recent glibc reproduces it. With the legacy-clone test binaries currently in the rootfs (older glibc) the path is not exercised, which is why CI doesn't catch it. The symptom for a modern-glibc pthread binary is the warning above followed by Create pthread error! (or, if creation is allowed to proceed with a bogus return, a crash during thread setup).

Expected behavior

clone3 should create a thread just like clone does, so pthread_create works on guests built with a modern glibc.

Proposed fix

Add ql_syscall_clone3 that unpacks struct clone_args and delegates to the existing ql_syscall_clone. The non-obvious translations:

  • child_stack = stack + stack_sizeclone3 passes the stack base plus a separate size; legacy clone wants the highest stack address.
  • exit_signal is its own struct clone_args field in clone3; legacy clone packs it into the low CSIGNAL byte of flags.
  • x8664 pre-swapql_syscall_clone swaps newtls<->child_tidptr to undo x8664's raw-syscall register order; since clone3 hands over already-logical args, pre-swap on x8664 so it cancels out.

Implemented on retrocpugeek:fix/clone3-syscall (off dev):

  • Fix: Implement the clone3 syscall
  • Regression test: test_elf_multithread.ELFTest.test_clone3_translates_to_clone — drives ql_syscall_clone3 directly and asserts the translation for both the generic path and the x8664 swap. Self-contained, runs on stock unicorn (no clone3 binary needed).

Verified end-to-end separately: a real MIPS64 BE glibc pthread binary now creates and joins threads correctly with this handler in place.

Happy to open a PR against dev.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions