Skip to content

Add atomvm:posix_kill/2#2348

Open
pguyot wants to merge 1 commit into
atomvm:release-0.7from
pguyot:w25/posix-kill
Open

Add atomvm:posix_kill/2#2348
pguyot wants to merge 1 commit into
atomvm:release-0.7from
pguyot:w25/posix-kill

Conversation

@pguyot

@pguyot pguyot commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Also stop leaking socat processes in tests

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

@petermm

petermm commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

correct commit this time around! pardon the brain-lapse

Amp - never fails on pedantic things, pick and choose:

PR Review — Add atomvm:posix_kill/2

Commit: 6802275Add atomvm:posix_kill/2 (also stop leaking socat processes in tests)
Author: Paul Guyot

Summary

Adds a new NIF atomvm:posix_kill/2 wrapping POSIX kill(2), wired through the
gperf NIF table with a HAVE_KILL feature guard (disabled on bare‑metal rp2/stm32).
Also fixes a socat process leak in the UART / serial‑dist tests by exec‑ing socat
(so the subprocess pid is socat, not the wrapping shell) and killing it with the
new NIF instead of spawning /bin/kill.

The integration (feature detection, gperf entry, platform guards) is correct, and
the socat exec + posix_kill fix is a real improvement. However, the NIF has
unchecked integer‑narrowing paths that can send the wrong signal to the wrong
target
, and the accepted pid range includes POSIX process‑group semantics that
are undocumented.

Recommendation

Request changes — Finding 1 (integer truncation) is a correctness/safety bug
in a kill(2) wrapper and should be fixed before merge. Findings 2–3 are
API/documentation decisions worth making now; 4–6 are nice‑to‑haves.


Findings (ranked)

1. High — oversized pid/signal can be truncated and sent to the wrong target/signal

nif_atomvm_posix_kill validates the pid with term_is_any_integer and extracts
it with term_maybe_unbox_int, and validates the signal with term_is_non_neg_int
then term_to_int:

VALIDATE_VALUE(argv[0], term_is_any_integer);
VALIDATE_VALUE(argv[1], term_is_non_neg_int);

if (UNLIKELY(kill((pid_t) term_maybe_unbox_int(argv[0]), term_to_int(argv[1])) != 0)) {

src/libAtomVM/posix_nifs.c#L961-L969

Problems:

  • On 64‑bit builds, an unboxed integer can exceed C int/typical pid_t. The cast
    truncates silently, so a huge pid/signal can collapse to a small value
    (0, 1, 15, …). For kill, a truncated pid of 0 targets the caller's own
    process group
    .
  • On 32‑bit builds, a boxed int64 value passes term_is_any_integer, but
    term_maybe_unbox_int reads only one boxed word — garbage pid.
  • The signal path has the same narrowing issue (term_to_int after a non‑neg check
    that doesn't bound to INT_MAX).

Fix: validate representability before casting, extract via term_to_int64,
and bound the signal to [0, INT_MAX].

--- a/src/libAtomVM/posix_nifs.c
+++ b/src/libAtomVM/posix_nifs.c
@@ (top includes)
 #include <errno.h>
+#if HAVE_KILL
+#include <limits.h>
+#endif
@@ #if HAVE_KILL
+static bool term_is_valid_pid(term t)
+{
+    if (!term_is_int64(t)) {
+        return false;
+    }
+    int64_t value = term_to_int64(t);
+    pid_t pid = (pid_t) value;
+    return ((int64_t) pid) == value;
+}
+
+static bool term_is_signal_number(term t)
+{
+    if (!term_is_int64(t)) {
+        return false;
+    }
+    int64_t value = term_to_int64(t);
+    return value >= 0 && value <= INT_MAX;
+}
+
 static term nif_atomvm_posix_kill(Context *ctx, int argc, term argv[])
 {
     UNUSED(argc);
-    VALIDATE_VALUE(argv[0], term_is_any_integer);
-    VALIDATE_VALUE(argv[1], term_is_non_neg_int);
+    VALIDATE_VALUE(argv[0], term_is_valid_pid);
+    VALIDATE_VALUE(argv[1], term_is_signal_number);
+
+    pid_t pid = (pid_t) term_to_int64(argv[0]);
+    int signal = (int) term_to_int64(argv[1]);
 
-    if (UNLIKELY(kill((pid_t) term_maybe_unbox_int(argv[0]), term_to_int(argv[1])) != 0)) {
+    if (UNLIKELY(kill(pid, signal) != 0)) {
         return errno_to_error_tuple_maybe_gc(ctx);
     }
     return OK_ATOM;
 }

term_is_int64 / term_to_int64 exist in
term.h#L743 and
term.h#L1376.

2. Medium — negative/zero pid enables process‑group semantics, undocumented

The spec is OsPid :: integer() and the NIF accepts all signs. POSIX kill gives
special meaning to non‑positive pids: 0 = caller's process group, -1 = all
permitted processes, < -1 = a specific process group. The docs describe it only
as "operating system process id, as returned by subprocess/4":

libs/eavmlib/src/atomvm.erl#L523-L534

Decide the intended surface:

  • Safe subprocess‑cleanup API (recommended): restrict to pid > 0 and change
    the spec to pos_integer(). In the helper from Finding 1:
    -    return ((int64_t) pid) == value;
    +    return value > 0 && ((int64_t) pid) == value;
    --spec posix_kill(OsPid :: integer(), Signal :: non_neg_integer()) ->
    +-spec posix_kill(OsPid :: pos_integer(), Signal :: non_neg_integer()) ->
  • Full kill(2) wrapper: keep negative pids but document the 0 / -1 / < -1
    semantics and the blast radius.

3. Medium/low — clarify what ok means

Docs say the function is used to "terminate a process". kill returning success
only means the request was accepted; the target may catch, block, or ignore the
signal, and signal 0 sends nothing (existence/permission probe only).

 %% @doc     Send a signal to a process using kill(2). Typically used to
-%%          terminate a process started with `subprocess/4'.
+%%          terminate a process started with `subprocess/4'. A return value of
+%%          `ok' means kill(2) accepted the request; it does not guarantee the
+%%          target has terminated, since the signal may be caught, blocked, or
+%%          ignored. Signal 0 performs a POSIX existence/permission check without
+%%          sending a signal.

4. Low — test coverage gap + platform skip

test_subprocess.erl
is good — the esrch case with pid 536870911 (2^29‑1, the 32‑bit small‑int max)
plus signal 0 nicely exercises boxed‑integer handling. Two improvements:

  • It only tests non‑integer badargs; add an oversized‑signal case to lock in
    Finding 1.
  • It only skips BEAM ("ATOM" guard). If eavmlib tests ever run on an AtomVM
    platform without subprocess/kill (emscripten, bare‑metal), this fails rather
    than skips. Gate on atomvm:platform() =:= generic_unix (exists at
    atomvm.erl#L132).
 test() ->
     case erlang:system_info(machine) of
-        "ATOM" ->
+        "ATOM" when atomvm:platform() =:= generic_unix ->
             ok = test_posix_kill(),
             ok = test_posix_kill_badarg(),
             ok = test_posix_kill_esrch(),
             ok;
         _ ->
             ok
     end.

 test_posix_kill_badarg() ->
-    ok =
-        try
-            atomvm:posix_kill(not_a_pid, 15),
-            fail
-        catch
-            error:badarg -> ok
-        end,
-    ok =
-        try
-            atomvm:posix_kill(1, not_a_signal),
-            fail
-        catch
-            error:badarg -> ok
-        end,
-    ok.
+    ok = expect_badarg(fun() -> atomvm:posix_kill(not_a_pid, 15) end),
+    ok = expect_badarg(fun() -> atomvm:posix_kill(1, not_a_signal) end),
+    ok = expect_badarg(fun() -> atomvm:posix_kill(1, 1 bsl 40) end),
+    ok.
+
+expect_badarg(Fun) ->
+    try Fun() of
+        _ -> fail
+    catch
+        error:badarg -> ok
+    end.

(If you adopt the positive‑pid API from Finding 2, also add badarg tests for 0
and -1.)

5. Low — errno atom mapping guard omits HAVE_KILL

posix_errno_to_term only atomizes errno values when file/time POSIX features are
present; otherwise it returns a raw integer:

#if HAVE_OPEN && HAVE_CLOSE || defined(HAVE_CLOCK_SETTIME) || defined(HAVE_SETTIMEOFDAY)
...
#else
    return term_from_int(err);
#endif

src/libAtomVM/posix_nifs.c#L76,
#L186-L189

On a hypothetical kill‑only platform, {error, esrch} would degrade to
{error, 3}. In practice any platform with kill also has open/close, so this
is theoretical — but cheap to make correct:

-#if HAVE_OPEN && HAVE_CLOSE || defined(HAVE_CLOCK_SETTIME) || defined(HAVE_SETTIMEOFDAY)
+#if HAVE_OPEN && HAVE_CLOSE || defined(HAVE_CLOCK_SETTIME) || defined(HAVE_SETTIMEOFDAY) || HAVE_KILL

6. Low — socat cleanup swallows the pre‑existing case

The socat fix is directionally correct: exec socat … makes the subprocess pid be
socat itself, so SIGTERM actually reaches it.

test_serial_dist_socat.erl#L223-L240,
test_uart.erl#L96-L121

Minor robustness: stop_socat asserts ok = atomvm:posix_kill(...). If socat
already exited, {error, esrch} will crash cleanup and can mask the real test
failure. Tolerating esrch in cleanup makes failures easier to diagnose:

 stop_socat({OsPid, Fd}) ->
-    ok = atomvm:posix_kill(OsPid, 15),
+    case atomvm:posix_kill(OsPid, 15) of
+        ok -> ok;
+        {error, esrch} -> ok
+    end,
     atomvm:posix_close(Fd).

What's correct (no action needed)

  • Feature detection & registration: HAVE_KILL probe in
    CMakeLists.txt#L261,
    IF_HAVE_KILL macro in
    nifs.c#L1036-L1040,
    gperf entry, and extern decl in the header.
  • Bare‑metal disabling: HAVE_KILL forced off on
    rp2 and
    stm32,
    matching the newlib stub rationale for the neighboring symbols.
  • CHANGELOG entry present and accurate.

@petermm

petermm commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

clean Amp (not a continuation of above):

PR Review — Add atomvm:posix_kill/2 (also stop leaking socat processes in tests)

Commit: 36de127c5de8104277bfd1651f61c7343760c5cb

Summary

Adds an atomvm:posix_kill(OsPid, Signal) NIF wrapping kill(2) and returning
ok | {error, posix_error()}, gated behind a new HAVE_KILL feature check
(disabled on rp2/stm32). Uses the new NIF to stop leaking socat subprocesses in
the UART and serial-dist tests, switching the shell wrappers to exec socat ...
so the returned OS pid is socat itself rather than the wrapping shell.

Overall: the core NIF is sound (validation, errno mapping, and feature gating
are correct). Two test-side issues should be addressed before merge.

Verdict

Change requested — one test reintroduces the wrapper-shell pid bug the commit
otherwise fixes, and the UART tests still leak socat on failure paths.


Findings

# Severity Area Issue
1 Major test_subprocess.erl Kills the wrapping shell, not sleep; test can hang/leak
2 Major test_uart.erl socat still leaks on assertion/error paths
3 Minor posix_nifs.c Local var signal shadows libc signal(3)

What is correct (no change needed)

  • pid_t round-trip check in term_is_valid_pid correctly rejects integers
    that would truncate to a different (possibly own) process.
  • Signal <= INT_MAX boundary in term_is_signal_number is the right cap
    before casting to int; oversized signals are rejected, not truncated
    (covered by the 1 bsl 40 test case).
  • Allowing negative OsPid matches documented kill(2) process-group semantics.
    The spec Signal :: non_neg_integer() / OsPid :: integer() is consistent
    with the docs — not a mismatch.
  • || HAVE_KILL added to the posix_errno_to_term guard is appropriate so
    {error, esrch | eperm | einval} can be returned on kill-only POSIX builds.
  • Tolerating {error, esrch} in stop_socat is safe: eperm/einval still
    crash the case expression, so real failures are not masked.

1. Major — test_subprocess kills the shell, not sleep

test_posix_kill/0 starts a subprocess via /bin/sh -c "sleep 30".
subprocess/4 returns the /bin/sh pid, and sleep inherits the stdout pipe's
write end. Killing only the shell does not reliably kill sleep, so
read_until_eof(Fd) can block until sleep 30 exits (or the 5s select timeout
fires). This contradicts the commit's own socat fix, which uses exec
specifically so the returned pid is the target process.

--- a/tests/libs/eavmlib/test_subprocess.erl
+++ b/tests/libs/eavmlib/test_subprocess.erl
     {ok, OsPid, Fd} = atomvm:subprocess(
-        "/bin/sh", ["sh", "-c", "sleep 30"], undefined, [stdout]
+        "/bin/sh", ["sh", "-c", "exec sleep 30"], undefined, [stdout]
     ),

2. Major — test_uart still leaks socat on failure paths

Each UART test calls start_socat() and only reaches stop_socat(SocatFd) at
the successful end of the function. If any assertion in between fails, the new
SIGTERM cleanup is skipped and socat keeps holding both ptys — exactly the
cascading pty-exhaustion the commit set out to prevent. The commit fixes
successful-run leaks but not failure-run leaks.

Suggested direction: guarantee cleanup with try ... after, e.g. a helper:

with_socat(Fun) ->
    {SocatFd, PtyA, PtyB} = start_socat(),
    try
        Fun(PtyA, PtyB)
    after
        stop_socat(SocatFd)
    end.

and wrap per-test socat usage (also closing any opened Fd via after):

test_posix_tcgetattr() ->
    with_socat(fun(PtyA, _PtyB) ->
        {ok, Fd} = atomvm:posix_open(PtyA, [o_rdwr, o_noctty]),
        try
            {ok, Tio} = atomvm:posix_tcgetattr(Fd),
            true = is_map(Tio),
            ok
        after
            atomvm:posix_close(Fd)
        end
    end).

Scope is moderate since several tests should be wrapped consistently.

3. Minor — int signal shadows libc signal(3)

In nif_atomvm_posix_kill, the local int signal shadows signal(3) from
<signal.h>. Legal C, but may trip -Wshadow and is easily avoided.

--- a/src/libAtomVM/posix_nifs.c
+++ b/src/libAtomVM/posix_nifs.c
     pid_t pid = (pid_t) term_to_int64(argv[0]);
-    int signal = (int) term_to_int64(argv[1]);
+    int signo = (int) term_to_int64(argv[1]);

-    if (UNLIKELY(kill(pid, signal) != 0)) {
+    if (UNLIKELY(kill(pid, signo) != 0)) {
         return errno_to_error_tuple_maybe_gc(ctx);
     }

@pguyot pguyot force-pushed the w25/posix-kill branch 3 times, most recently from 5b353b5 to 11ae01c Compare July 3, 2026 21:42
Also stop leaking socat processes in tests

Signed-off-by: Paul Guyot <pguyot@kallisys.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants