Skip to content

Fix the pipeline failure about java - spring - ci#48488

Draft
rujche wants to merge 6 commits intomainfrom
rujche/main/fix-unit-test-failure-in-Test_ubuntu2404_121_NotFromSource_TestsOnly
Draft

Fix the pipeline failure about java - spring - ci#48488
rujche wants to merge 6 commits intomainfrom
rujche/main/fix-unit-test-failure-in-Test_ubuntu2404_121_NotFromSource_TestsOnly

Conversation

@rujche
Copy link
Member

@rujche rujche commented Mar 20, 2026

Description

The Spring CI pipeline job "Test ubuntu2404_121_NotFromSource_TestsOnly" intermittently hangs until the 60-minute pipeline timeout kills it. The failure is not deterministic — retrying the pipeline typically resolves it.

Root Cause Analysis

Investigation traced the hang to a JVM crash (EXCEPTION_ACCESS_VIOLATION in Class.getDeclaredConstructors0) occurring inside a Surefire forked JVM during Spring Boot test context creation. The crash happens under high concurrent class-loading pressure when:

Maven runs multiple modules in parallel (-T 1C)

Each module forks a JVM for testing (forkCount=1)
JUnit 5 parallel test execution is enabled (parallelizeTests=concurrent)
Mockito/byte-buddy dynamically generates proxy bytecode alongside Spring's ApplicationContext proxy creation
When the forked JVM crashes, Surefire reports "Corrupted channel by directly writing to native stream in forked JVM" but has no timeout configured — so Maven waits indefinitely for the dead process, causing the pipeline to hang until the Azure DevOps job timeout (60 minutes) kills the entire build.

Fix

Added 600 to the maven-surefire-plugin configuration in azure-client-sdk-parent. This gives each forked test JVM a 10-minute timeout. If a fork crashes or hangs, Surefire will detect the timeout, kill the process, and report a clear error instead of waiting indefinitely.

Verification

The 600-second (10-minute) timeout is generous enough that no legitimate test run should hit it — the full Spring autoconfigure module (1073 tests) completes in ~32 seconds locally.
When a fork does crash/hang, the pipeline will now fail fast with a descriptive timeout error and free the CI agent, rather than consuming the full 60-minute job timeout.

All SDK Contribution checklist:

  • The pull request does not introduce [breaking changes]
  • CHANGELOG is updated for new features, bug fixes or other significant changes.
  • I have read the contribution guidelines.

General Guidelines and Best Practices

  • Title of the pull request is clear and informative.
  • There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

  • Pull request includes test coverage for the included changes.

Copilot AI review requested due to automatic review settings March 20, 2026 02:31
@rujche rujche self-assigned this Mar 20, 2026
@rujche rujche added the azure-spring All azure-spring related issues label Mar 20, 2026
@rujche rujche added this to the 2026-04 milestone Mar 20, 2026
@github-actions github-actions bot added the Azure.Core azure-core label Mar 20, 2026
@rujche
Copy link
Member Author

rujche commented Mar 20, 2026

/azp run java - spring - ci

@azure-pipelines
Copy link

Azure Pipelines could not run because the pipeline triggers exclude this branch/path.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a timeout to Maven Surefire forked JVMs in the Azure SDK Java client parent POM to prevent CI jobs from hanging indefinitely when a forked test process crashes (e.g., corrupted channel / dead fork).

Changes:

  • Configure maven-surefire-plugin with forkedProcessTimeoutInSeconds=600 in azure-client-sdk-parent to fail fast instead of waiting forever on a dead forked JVM.

@rujche rujche marked this pull request as draft March 20, 2026 02:52
@rujche rujche changed the title Add forkedProcessTimeoutInSeconds to surefire plugin to prevent CI pipeline hangs Fix the pipeline failure about java - spring - ci Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Azure.Core azure-core azure-spring All azure-spring related issues

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

2 participants