Forge JIT Backend Integration #35

da-roth · 2026-01-10T16:44:44Z

This PR integrates the Forge JIT backend for XAD, adding optional native code generation support. Forge is an optional dependency - everything builds and runs without it.

Changes

Build options added:

QLRISKS_ENABLE_FORGE: Enable Forge JIT backend
QLRISKS_ENABLE_FORGE_TESTS: Include Forge tests in test suite
QLRISKS_ENABLE_JIT_TESTS: Enable XAD JIT tests (interpreter backend, no Forge)
QLRISKS_BUILD_BENCHMARK / QLRISKS_BUILD_BENCHMARK_STANDALONE: Benchmark executables

Files added:

test-suite/jit_xad.cpp: JIT infrastructure tests (interpreter backend)
test-suite/forgebackend_xad.cpp: Forge backend tests
test-suite/swaption_jit_pipeline_xad.cpp: JIT pipeline tests for LMM Monte Carlo
test-suite/swaption_benchmark.cpp: Boost.Test benchmarks
test-suite/benchmark_main.cpp: Standalone benchmark executable
test-suite/PlatformInfo.hpp: Platform detection utilities
.github/workflows/ql-benchmarks.yaml: Benchmark workflow

Files modified:

CMakeLists.txt: Forge integration options
test-suite/CMakeLists.txt: Conditional test/benchmark targets
.github/workflows/ci.yaml: Added forge-linux and forge-windows jobs

Benchmarks

The benchmark workflow (ql-benchmarks.yaml) runs swaption pricing benchmarks comparing FD, XAD tape, JIT scalar, and JIT-AVX methods on Linux and Windows.

Also included some initial work towards #33 - the workflow has type overhead jobs that compare double vs xad::AReal pricing performance (no derivatives) on the same hardware, providing a baseline for measuring XAD type overhead.

Example benchmark run (Linux) Link

This reverts commit c84d01c.

da-roth · 2026-01-27T08:13:05Z

Hi @auto-differentiation-dev, thanks for the thorough review!

I refactored the benchmark tests quite a bit - hope I incorporated everything as wished. Building QL now 3 times (native double, XAD (JIT=OFF), XAD (JIT=ON)) and creating a combined report:

and since I saw for 100k paths that JIT=Off seemed to run into a bottleneck (I'd guess it's a memory issue since we are recording all paths and doing one adjoint call), I added XAD-Split to the JIT=Off, too. It does the same as in the JIT=On case (use XAD for bootstrap and create Jacobian and then XAD per-path). Its performance scales as expected.
Best, Daniel

da-roth · 2026-02-02T08:15:24Z

Hi @auto-differentiation-dev ,

this PR is now in a state for another round of review. I put the overhead workflow into a new PR that is checked out from this PR's branch and I'll follow up on it ( #37) once this one is merged.

Best, Daniel

auto-differentiation-dev · 2026-02-02T09:03:35Z

Hi @da-roth,

Sorry it took us some time to come back on this, and thank you for all the work - it is taking good shape.

For XAD, we think only the XAD-split mode should be reported. This is the practical way to run Monte Carlo with XAD using path-wise derivatives, and it is what we would encourage users to adopt (see https://auto-differentiation.github.io/faq/#quant-finance-applications
).

Looking at the results, we noticed a few patterns in the reported timings that seem unexpected and would be good to clarify. Overall, all methods show the expected linear behaviour - a fixed overhead plus a component that grows linearly with the number of paths. That said, the relative behaviour between methods looks unusual.

In particular:

The fixed cost for FD is extremely large (around 9s). Conceptually, this should correspond to the curve bootstrapping cost multiplied by the number of sensitivities, but even then it looks excessively high. Can you clarify what is driving this overhead?
The per-path cost of FD compared to XAD differs only by about a factor of 4, even though FD requires roughly 45 re-runs. Intuitively, we would expect a much larger gap here - how do you explain this behaviour?
Forge with AVX2 appears around 6x faster than Forge without AVX2 in terms of per-path scaling, despite AVX2 only providing 4 double-precision vector lanes. This seems stronger than what hardware vectorisation alone would suggest.
The reported curve bootstrap time with XAD of 223ms does not seem consistent with the overall trends implied by the data, where the fixed overhead looks closer to ~133ms, and Forge compilation appears to be around 120ms.

This makes us wonder whether there might be something off in how the timings are being measured or attributed. It would be helpful to understand how you interpret these trends.

Thanks again, and happy to discuss further.

fixes use original evolve for xad-split tried fixes first try added local script for fd and running benchmark locally

da-roth added 30 commits December 15, 2025 09:34

forge integration

c84d01c

Revert "forge integration"

a3605f9

This reverts commit c84d01c.

forge integration

98c276b

up

6c331dc

use setBackend

15e8808

updates due to xad-jit

b08fd88

updated backend naming

33e3d69

graph updates with jitnode

8494c87

first refactor

36eacb8

revert changes on fork

311d387

update path

2591bb5

update ci

41dc182

remove compatibility layer

a4021b2

updates

66b8ea2

update backend namespaces

7771b93

update c api workflow

b68d6f6

update benchmark yaml

7c75cd1

fix benchmark yaml

8e1ce66

update ci.yaml

f7202e4

updates

fbda9f6

updated ci.yaml

b8ca808

update ci.yaml

95e634a

updated win build

98d8b86

fix benchmark baseline

c17f541

updates

ed68306

added benchmark_utils

cf5030b

temporarily use xad-jit

8467808

up

58267e1

up

3642cfa

use double forgebackend

7cda13d

added xadsplit

580ed86

da-roth requested a review from auto-differentiation-dev January 23, 2026 10:56

da-roth added 19 commits January 23, 2026 12:12

updated benchmark aad

9d932e9

reorder tests

4091fbb

fix xad-split

5e8e53d

up

ff880f0

up

b1b2789

update

e74420c

up

2ae3330

script for local

a834d63

quicker build

be247fb

custom m1 and m2

971bac6

fix evolve

4a25180

up

f14575d

remove sh

1dc5410

update ci

ba76944

update to have xad-split in xad

5775a7e

update benchmarks yaml

32964ff

updated texts

d01e3c4

update benchmark stuff

9410273

fix upstream

14cabf1

use upstream ci names

5a9cddf

da-roth mentioned this pull request Feb 2, 2026

Overhead benchmarks #37

Draft

da-roth added 4 commits February 2, 2026 11:39

tried fixes

e24e207

fixes use original evolve for xad-split tried fixes first try added local script for fd and running benchmark locally

clean locally

55de855

up

8780440

up

834894e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forge JIT Backend Integration #35

Forge JIT Backend Integration #35

Uh oh!

da-roth commented Jan 10, 2026 •

edited

Loading

Uh oh!

da-roth commented Jan 27, 2026 •

edited

Loading

Uh oh!

da-roth commented Feb 2, 2026

Uh oh!

auto-differentiation-dev commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Forge JIT Backend Integration #35

Are you sure you want to change the base?

Forge JIT Backend Integration #35

Uh oh!

Conversation

da-roth commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

da-roth commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

da-roth commented Feb 2, 2026

Uh oh!

auto-differentiation-dev commented Feb 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

da-roth commented Jan 10, 2026 •

edited

Loading

da-roth commented Jan 27, 2026 •

edited

Loading