Skip to content

Add custom OOM killer for Linux containers#653

Open
JaewonHur wants to merge 1 commit intoapple:mainfrom
JaewonHur:oom-kill
Open

Add custom OOM killer for Linux containers#653
JaewonHur wants to merge 1 commit intoapple:mainfrom
JaewonHur:oom-kill

Conversation

@JaewonHur
Copy link
Copy Markdown
Contributor

This PR implements a custom OOM killer that is spawned as a child process of vmexec.

While Linux kernel also OOM kills a process if it hits cgroup memory limit and the kernel cannot reclaim the memory, kernel often fails to kill the process and left the system hang due to the memory thrashing. Especially, the process is not OOM killed because the kernel still succeeds reclaiming the memory, not meeting the condition for OOM kill (but which takes way longer time, and leads to hang).

Thus, this PR adds a user space OOM killer as a child process of vmexec, which monitors cgroup memory events, and kills the process when max event hits a specified limit. This approach can reliably kills the OOM process as monitoring memory events can be performed in small time window.

This PR needs following more works:

  1. Plumb UI to inform the users that the container has been killed due to the OOM.
  2. Refactor errorPipe to catch errors from (long running) OOM killer process (or any other ways to catch the errors).

@dkovba dkovba self-requested a review April 6, 2026 20:03
usleep(1_000_000)
let events = try cgroupManager.getMemoryEvents()

if events.max > oomLimit {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

events.max represents the number of events. 1_000_000 seems to be a too large threshold. Would it be appropriate to use a threshold of zero?


while true {
usleep(1_000_000)
let events = try cgroupManager.getMemoryEvents()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move MemoryMonitor to the Cgroup library and use it instead of pulling memory events with a fixed interval? CC @dcantah

let events = try cgroupManager.getMemoryEvents()

if events.max > oomLimit {
try cgroupManager.kill()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use App.writeError(error) and exit(code) when try fails?

@dcantah
Copy link
Copy Markdown
Member

dcantah commented Apr 6, 2026

I'm sort of confused on what this is trying to solve. If the idea is we have some oom kills (likely child processes) that happen but init keeps running, there exists a cgroup toggle that makes it such that every process in the cgroup gets killed if there was an oom condition. Meaning, if the init process for the container is well within its limits, but some child process(es) keep getting oom killed, the kernel would kill the whole cgroup (and thus the whole container).

@dkovba
Copy link
Copy Markdown
Contributor

dkovba commented Apr 6, 2026

When we run out of memory, a container hangs. The goal is to make it crash with an OOM error.

@dcantah
Copy link
Copy Markdown
Member

dcantah commented Apr 6, 2026

Ok, regardless of what we decide I don't think we should run a separate forked process to do this. We can try and expose an API on LinuxContainer to monitor memory events in a stream like fashion. If we don't want to to do that either, today this whole scheme could be done with the APIs we expose right now. You could call LinuxContainer.statistics every {arbitrary} seconds and check the memoryEvents field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants