Skip to content

MacOS flaky tests#1676

Open
sandboxcoder wants to merge 2 commits intostagingfrom
rno/macos-flaky-tests
Open

MacOS flaky tests#1676
sandboxcoder wants to merge 2 commits intostagingfrom
rno/macos-flaky-tests

Conversation

@sandboxcoder
Copy link
Copy Markdown
Contributor

@sandboxcoder sandboxcoder commented Apr 22, 2026

  • do not skip recording, streaming, etc tests when running on MacOS. I have enabled all tests to run to achieve parity with Windows.
  • log when tests are being skipped.
  • set CI env var for MacOS github runner for obs_handler.ts so it knows when its' running on CI github runner. I do not know why Windows CI machine does not need this step it somehow has CI env var defined somewhere. It would have been nice to reuse what Windows is doing?
  • fix warning we cant assign null: let secondContext: osn.IVideo = null;
  • do not invoke AddVideoContext on CI builds in enhanced broadcasting tests since these are disabled.
  • main.yml: upgrade to macos-15 for arm64 ARCH (same OS Intel ARCH is using). Previously it was using macos-14 which seemed more unstable.
  • main.yml: I added a task for both Win/MacOS to upload OBS logs which can be helpful for debugging. (MacOS) There is also an env var, SUPPRESS_STREAMLABS_OBS_LOGS, I added to main.yml to allow the obs64 child process to attach to stdout/stderr like it used to- if one likes to see the verbose backend spew which can be helpful for troubleshooting CI-only issues...
  • package.json: increase max # of retries. MacOS CI machine struggles with test_osn_dual_output most of all (test Start Dual Output with recording and scene items and Start Dual Output with advanced recording and audio scene items - record stop signal timeout case). For now, bumping retries to 3 helps but this will be revisited.

main.yml: set fail-fast to false

  • turn off fail-fast so the tests can continue running which will fix issue where if macos-15-intel fails, macos-arm64 is cancelled before it can finish running. This can be removed later but it was helpful to keep this set to false to troubleshoot flaky tests.

osn_handler: log time delta for long duration

  • add warning when signal received took longer then expectedDeadline

obs_handler: assign explicit log filename for tests

Fix orphaned worker thread:

The WorkerSignals::worker keeps polling Query. When a test fails before calling destroy(), the server-side recording gets freed when OBS shuts down, but the orphaned worker thread continues. When the next test file creates a fresh OBS connection, Controller::GetInstance().GetConnection() now returns the new session's connection. The orphaned worker sends Query(oldUID) to the new server, which has no knowledge of that uid → "Recording reference is not valid." Note: from my research, it is best to use good try/catch/finally pattern on the caller side to invoke destroy() to ensure the Worker state variables are properly reset unless we add something more extensive on backend to defend against orphaned worker threads.

Remove source message listener when test ends

  • test_osn_simple_recording: stops worker thread properly now that it invokes removeSourceMessageListener
  • test_osn_simple_recording: remove lamda to enable timeout (helps with local testing when I need to extend timeout plus this pattern is recommended)

Tests: simple_recording,test_osn_advanced_replayBuffer.ts,test_osn_advanced_recording,dual_output - wrap with try/catch/finally

  • update any test that starts a worker thread via create() to properly invoke destroy() within a finally block to guanrantee the worker thread is stopped even if the test fails. Prevents orphaned worker threads from invoking Query() every 33ms on the new IPC connection created by other tests.

guarantees cleanup even when tests throw so these tests will clean up proactively, and any orphaned workers that slip through will stop themselves rather than polluting subsequent test runs.

validate totalSleepMS does not have a negative value

When the IPC call takes longer than sleepIntervalMS on an overloaded CI runner, sleepIntervalMS - dur.count() produces a negative result that wraps to a huge size_t value, causing the worker to sleep for an effectively infinite duration. After that, no more signals are ever polled — any test waiting for a signal hits the 30-second timeout.

do not share stdout/stderr with parent process

On macOS, the server is spawned with posix_spawnp with NULL for file_actions, which means it inherits the parent's stdout/stderr. On Windows this doesn't happen because CreateProcessW uses DETACHED_PROCESS | CREATE_NO_WINDOW, which cuts the server off from the parent's console entirely. This merely helps declutter the test logs. OBS logs are now uploaded. Added SUPPRESS_STREAMLABS_OBS_LOGS environment variable to switch this back to legacy if desired. Locally, SUPPRESS_STREAMLABS_OBS_LOGS is false to maintain legacy behavior so you can still see backend spew which can be helpful for debugging imo.

obs_handler: add more retryable timeouts for flaky tests
obs_handler: Reduce defaultVideoContext from 1280x720@60fps to 1280x720@30fps to reduce CPU load on the slow x86_64 CI machine. I did not profile the exact benefit however....

(MacOS) controller.cpp: kill prev obs64 before starting

  • Should make running multiple tests in succession on this slow CI more reliable. Add a waitpid(pids[i], nullptr, 0) call that blocks until the SIGKILLed process is fully reaped by the kernel before the new posix_spawnp runs, eliminating the race. When I run these tests locally with Activity Monitor open, I will now only see one instance of obs64 running which is a big help to these 3 core virtual macs

osn-replay-buffer: invoke output handler to save

  • makes test more reliable resulting in much less replay buffer failures on this MacOS CI virtual mac
  • follows similar pattern used by OBS - ReplayBufferSave
  • The previous approach can silently do nothing if:
  • The hotkey type check (OBS_HOTKEY_REGISTERER_OUTPUT) doesn't match
  • if no routed callback is registered, the save is a no-op
  • The hotkey hasn't been registered yet (race condition on slow CI machines)

Things I tried:

  • test_osn_dual_output: I tried explicitly assigning recording2.mixer=2 for both tests 1 & 2 (the most problematic ones for MacOS CI machine). I could not tell if this helped or not. So removed for now
  • I tried upgrading the github runners to large (more cores). But with more cores I inexplicably saw more issues and instability. It also did not seem any faster but I did not profile this. Perhaps I missed something there?

Description

Motivation and Context

Enable all tests to run on MacOS

How Has This Been Tested?

Ran tons of builds.

Types of changes

  • Code cleanup (non-breaking change which makes code smaller or more readable)

Checklist:

  • The code has been tested.
  • All commit messages are properly formatted and commits squashed where appropriate.
  • I have included updates to all appropriate documentation.

@sandboxcoder sandboxcoder marked this pull request as draft April 22, 2026 18:57
@sandboxcoder sandboxcoder force-pushed the rno/macos-flaky-tests branch 2 times, most recently from 984c25c to 84e3fc7 Compare April 22, 2026 20:12
* enable recording, streaming, dual-output tests on MacOS
* set CI env var for MacOS github runner for obs_handler.ts
* fix warning we cant assign null: `let secondContext: osn.IVideo = null;`
* do not invoke `AddVideoContext` on CI builds in enhanced broadcasting tests since these are disabled.
* main.yml: upgrade to macos-15 for arm64 ARCH (same OS Intel ARCH is using).
* main.yml: upload OBS logs to troubleshoot unit tests.

main.yml: set fail-fast to false

* turn off fail-fast so the tests can continue running which will fix issue where if macos-15-intel fails, macos-arm64 is cancelled before it can finish running.

osn_handler: log time delta for long duration

* add warning when signal received took longer then expectedDeadline

obs_handler: assign explicit log filename for tests

Fix orphaned worker thread:

The WorkerSignals::worker keeps polling Query. When a test fails before
calling destroy(), the server-side recording gets freed when OBS shuts
down, but the orphaned worker thread continues. When the next test file
creates a fresh OBS connection, Controller::GetInstance().GetConnection()
now returns the new session's connection. The orphaned worker sends
  Query(oldUID) to the new server, which has no knowledge of that
uid → "Recording reference is not valid."

Remove source message listener when test ends

* stops worker thread properly
* test_osn_simple_recording: remove lamda to enable timeout

Tests: simple_recording,test_osn_advanced_replayBuffer,test_osn_advanced_recording,dual_output - wrap with try/finally

* update any test that starts a worker thread via create() to properly invoke
destroy() within a finally block to guanrantee the worker thread is stopped even if the test fails. Prevents orphaned worker threads from invoking Query() every 33ms on the new IPC connection created by other tests.

guarantees cleanup even when tests throw so they tests will clean up proactively, and any orphaned workers that slip through will stop themselves rather than polluting subsequent test runs.

validate totalSleepMS does not have a negative value

When the IPC call takes longer than sleepIntervalMS on an overloaded CI
runner), sleepIntervalMS - dur.count() produces a negative result that
wraps to a huge size_t value, causing the worker to sleep for an
effectively infinite duration. After that, no more signals are ever
polled — any test waiting for a signal hits the 30-second timeout.

do not share stdout/stderr with parent process

On macOS, the server is spawned with posix_spawnp with NULL for
file_actions, which means it inherits the parent's stdout/stderr.
On Windows this doesn't happen because CreateProcessW uses
DETACHED_PROCESS | CREATE_NO_WINDOW, which cuts the server off
from the parent's console entirely.

obs_handler: add more retryable timeouts for flaky tests

advanced-recording fix

* Removed extraneous set_video_mix since audio encoders don't have a video mix which produced the "encoder 'track1' is not a video encoder" warning
* Reduce defaultVideoContext from 1280x720@60fps to 1280x720@30fps to reduce CPU load on the slow x86_64 CI machine.

(MacOS) controller.cpp: kill prev obs64 before starting

* Should make running multiple tests in succession on this slow CI more
reliable. Add a waitpid(pids[i], nullptr, 0) call that blocks until the
SIGKILLed process is fully reaped by the kernel before the new
posix_spawnp runs, eliminating the race.

osn-replay-buffer: invoke output handler to save

* makes test more reliable
* follows similar pattern used by OBS::ReplayBufferSave
@sandboxcoder sandboxcoder force-pushed the rno/macos-flaky-tests branch from a8900f6 to e076193 Compare April 22, 2026 22:34
* remove timeout I added in commit 25eaa10. Use global timeout
@sandboxcoder sandboxcoder marked this pull request as ready for review April 23, 2026 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant