Skip to content

Conversation

@da-roth
Copy link
Contributor

@da-roth da-roth commented Jan 10, 2026

This PR integrates the Forge JIT backend for XAD, adding optional native code generation support. Forge is an optional dependency - everything builds and runs without it.

Changes

Build options added:

  • QLRISKS_ENABLE_FORGE: Enable Forge JIT backend
  • QLRISKS_ENABLE_FORGE_TESTS: Include Forge tests in test suite
  • QLRISKS_ENABLE_JIT_TESTS: Enable XAD JIT tests (interpreter backend, no Forge)
  • QLRISKS_BUILD_BENCHMARK / QLRISKS_BUILD_BENCHMARK_STANDALONE: Benchmark executables

Files added:

  • test-suite/jit_xad.cpp: JIT infrastructure tests (interpreter backend)
  • test-suite/forgebackend_xad.cpp: Forge backend tests
  • test-suite/swaption_jit_pipeline_xad.cpp: JIT pipeline tests for LMM Monte Carlo
  • test-suite/swaption_benchmark.cpp: Boost.Test benchmarks
  • test-suite/benchmark_main.cpp: Standalone benchmark executable
  • test-suite/PlatformInfo.hpp: Platform detection utilities
  • .github/workflows/ql-benchmarks.yaml: Benchmark workflow

Files modified:

  • CMakeLists.txt: Forge integration options
  • test-suite/CMakeLists.txt: Conditional test/benchmark targets
  • .github/workflows/ci.yaml: Added forge-linux and forge-windows jobs

Benchmarks

The benchmark workflow (ql-benchmarks.yaml) runs swaption pricing benchmarks comparing FD, XAD tape, JIT scalar, and JIT-AVX methods on Linux and Windows.

Also included some initial work towards #33 - the workflow has type overhead jobs that compare double vs xad::AReal pricing performance (no derivatives) on the same hardware, providing a baseline for measuring XAD type overhead.

Example benchmark run (Linux) Link

@da-roth
Copy link
Contributor Author

da-roth commented Jan 27, 2026

Hi @auto-differentiation-dev, thanks for the thorough review!

I refactored the benchmark tests quite a bit - hope I incorporated everything as wished. Building QL now 3 times (native double, XAD (JIT=OFF), XAD (JIT=ON)) and creating a combined report:
image
and since I saw for 100k paths that JIT=Off seemed to run into a bottleneck (I'd guess it's a memory issue since we are recording all paths and doing one adjoint call), I added XAD-Split to the JIT=Off, too. It does the same as in the JIT=On case (use XAD for bootstrap and create Jacobian and then XAD per-path). Its performance scales as expected.
Best, Daniel

@da-roth da-roth mentioned this pull request Feb 2, 2026
@da-roth
Copy link
Contributor Author

da-roth commented Feb 2, 2026

Hi @auto-differentiation-dev ,

this PR is now in a state for another round of review. I put the overhead workflow into a new PR that is checked out from this PR's branch and I'll follow up on it ( #37) once this one is merged.

Best, Daniel

@auto-differentiation-dev
Copy link
Contributor

Hi @da-roth,

Sorry it took us some time to come back on this, and thank you for all the work - it is taking good shape.

For XAD, we think only the XAD-split mode should be reported. This is the practical way to run Monte Carlo with XAD using path-wise derivatives, and it is what we would encourage users to adopt (see https://auto-differentiation.github.io/faq/#quant-finance-applications
).

Looking at the results, we noticed a few patterns in the reported timings that seem unexpected and would be good to clarify. Overall, all methods show the expected linear behaviour - a fixed overhead plus a component that grows linearly with the number of paths. That said, the relative behaviour between methods looks unusual.

In particular:

  • The fixed cost for FD is extremely large (around 9s). Conceptually, this should correspond to the curve bootstrapping cost multiplied by the number of sensitivities, but even then it looks excessively high. Can you clarify what is driving this overhead?
  • The per-path cost of FD compared to XAD differs only by about a factor of 4, even though FD requires roughly 45 re-runs. Intuitively, we would expect a much larger gap here - how do you explain this behaviour?
  • Forge with AVX2 appears around 6x faster than Forge without AVX2 in terms of per-path scaling, despite AVX2 only providing 4 double-precision vector lanes. This seems stronger than what hardware vectorisation alone would suggest.
  • The reported curve bootstrap time with XAD of 223ms does not seem consistent with the overall trends implied by the data, where the fixed overhead looks closer to ~133ms, and Forge compilation appears to be around 120ms.

This makes us wonder whether there might be something off in how the timings are being measured or attributed. It would be helpful to understand how you interpret these trends.

Thanks again, and happy to discuss further.

fixes

use original evolve for xad-split

tried fixes

first try

added local script for fd and running benchmark locally
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants