Skip to content

Async crypto v2#337

Open
bigbrett wants to merge 7 commits intowolfSSL:mainfrom
bigbrett:async-crypto-v2
Open

Async crypto v2#337
bigbrett wants to merge 7 commits intowolfSSL:mainfrom
bigbrett:async-crypto-v2

Conversation

@bigbrett
Copy link
Copy Markdown
Contributor

@bigbrett bigbrett commented Apr 16, 2026

Async Crypto API for SHA

Introduces a non-blocking request/response split for wolfHSM crypto operations, starting with the SHA family. Every blocking call gains a *Request() / *Response() pair; Request() sends and returns immediately, Response() polls once and returns WH_ERROR_NOTREADY until the server replies. Existing blocking wrappers and the wh_Client_CryptoCb path are unchanged, so current application code keeps working.

Summary

  • New async API surface in wh_client_crypto.h covering SHA-1/224/256/384/512
  • Refactored wh_client_crypto.c around shared request/response primitives
  • Updated message layouts in wh_message_crypto.{c,h} to carry async-specific state (intermediate hash state, trailing input, DMA metadata)
  • Server-side routing updates in wh_server_crypto.c (requests handled statelessly per design)
  • Design doc: docs/draft/async-crypto.md (includes roadmap for remaining algorithms)
  • Extensive new tests in test/wh_test_crypto.c
  • Drive-by: fix gcov flag for gcov bug in code-coverage.yml

Future work

  • other algorithms
  • pending request tracking in comm layer enforcing single-request in flight at a time, as well as remote server reset functionality allowing for re-synchronization in case the server locks up

Copilot AI review requested due to automatic review settings April 16, 2026 17:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an async (non-blocking) client crypto API for SHA-family operations by splitting each operation into *Request() and *Response() halves, while keeping existing blocking wrappers and the crypto-callback path compatible.

Changes:

  • Adds async SHA-224/256/384/512 client APIs (plus DMA async variants) and supporting client DMA async bookkeeping.
  • Updates SHA message wire formats to carry intermediate state plus variable-length trailing input, and refactors server handlers to be stateless per request.
  • Expands crypto test coverage to exercise large-input, boundary, and async request/response flows; updates coverage tooling flags and POSIX example buffer sizing.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
wolfhsm/wh_message_crypto.h Redefines SHA request/DMA request layouts to support variable-length input and inline state; adds per-call inline capacity macros.
wolfhsm/wh_client_crypto.h Adds async Request/Response API declarations for SHA2 variants (and DMA variants), including bool request-sent signaling.
wolfhsm/wh_client.h Adds per-client DMA async context storage for SHA to survive Request/Response boundary for POST cleanup.
src/wh_server_crypto.c Updates server-side SHA/SHA-DMA handlers to consume new layouts, validate inSz, and operate statelessly.
src/wh_message_crypto.c Updates translation functions for new SHA request/DMA request formats and expanded DMA response fields.
test/wh_test_crypto.c Adds extensive new SHA tests (large input, capacity boundaries, async non-DMA + DMA).
test/wh_test_check_struct_padding.c Updates padding/compile checks for renamed SHA DMA request structs.
test/Makefile Adds gcovr parse-error ignore flag for coverage generation.
.github/workflows/code-coverage.yml Mirrors gcovr parse-error ignore flag in CI coverage summary step.
examples/posix/wh_posix_server/wolfhsm_cfg.h Updates example comm buffer size to 8 KiB and clarifies it must match client.
examples/posix/wh_posix_client/wolfhsm_cfg.h Updates example comm buffer size to 8 KiB.
examples/posix/wh_posix_cfg.h Updates SHM req/resp buffer sizing to accommodate larger comm packets.
docs/draft/async-crypto.md Adds design doc describing the async crypto API and SHA wire protocol changes.
docs/draft/posix-shm.md Adds design doc explaining POSIX SHM transport vs DMA feature vs allocator glue.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread wolfhsm/wh_message_crypto.h
Comment thread wolfhsm/wh_message_crypto.h
Comment thread examples/posix/wh_posix_cfg.h
@bigbrett bigbrett self-assigned this Apr 16, 2026
@bigbrett bigbrett mentioned this pull request Apr 16, 2026
@bigbrett bigbrett closed this Apr 16, 2026
@bigbrett bigbrett reopened this Apr 16, 2026
@bigbrett bigbrett requested review from wolfSSL-Fenrir-bot and removed request for wolfSSL-Fenrir-bot April 16, 2026 18:05
Copy link
Copy Markdown

@wolfSSL-Fenrir-bot wolfSSL-Fenrir-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fenrir Automated Review — PR #337

Scan targets checked: wolfhsm-consttime, wolfhsm-crypto-bugs, wolfhsm-defaults, wolfhsm-mutation, wolfhsm-proptest, wolfhsm-src, wolfhsm-zeroize

Findings: 8
8 finding(s) posted as inline comments (see file-level comments below)

This review was generated automatically by Fenrir. Findings are non-blocking.

Comment thread src/wh_client_crypto.c
Comment thread src/wh_server_crypto.c
Comment thread src/wh_client_crypto.c
Comment thread src/wh_client_crypto.c
Comment thread src/wh_client_crypto.c
Comment thread src/wh_client_crypto.c
Comment thread src/wh_client_crypto.c
Comment thread src/wh_client_crypto.c
@bigbrett bigbrett marked this pull request as ready for review April 16, 2026 18:54
Copilot AI review requested due to automatic review settings April 16, 2026 18:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an asynchronous (non-blocking) request/response API surface for SHA-family crypto operations in wolfHSM, while keeping existing blocking wrappers and the crypto-callback path compatible. This expands the wire protocol to carry resumable hash state + variable-length trailing input, and updates server routing accordingly.

Changes:

  • Introduces *Request() / *Response() async APIs for SHA-256/224/384/512 (DMA and non-DMA variants).
  • Updates SHA message wire layouts (inline variable-length input, resumable state, new DMA request/response structs) and server-side handling to be stateless per request.
  • Adds extensive tests for large inputs and async flows; updates POSIX example buffer sizing/config and coverage tooling/docs.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
wolfhsm/wh_message_crypto.h New SHA request wire formats (variable-length trailing input), new SHA DMA request/response structs, inline-capacity macros
wolfhsm/wh_client_crypto.h Declares new async SHA Request/Response APIs (DMA + non-DMA)
wolfhsm/wh_client.h Adds per-client DMA async context storage for POST cleanup across Request/Response
src/wh_server_crypto.c Refactors SHA handlers to consume inSz trailing bytes and enforce invariants; updates SHA DMA handlers for new messages
src/wh_message_crypto.c Updates message translation functions for new SHA structs (incl. DMA)
test/wh_test_crypto.c Adds reference hashing + large-input and async tests for SHA variants (and DMA async tests under DMA)
test/wh_test_check_struct_padding.c Updates padding-check declarations for new SHA DMA request structs
examples/posix/wh_posix_{client,server}/wolfhsm_cfg.h Sets comm data length to 8KiB for POSIX examples
examples/posix/wh_posix_cfg.h Increases SHM transport request/response buffer sizes to match larger comm payloads
docs/draft/async-crypto.md Design doc describing async crypto API and SHA wire protocol
docs/draft/posix-shm.md Design doc explaining POSIX SHM transport + DMA layering
test/Makefile Adds gcovr parse-error ignore option to coverage target
.github/workflows/code-coverage.yml Mirrors the gcovr ignore option in CI coverage summary step

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread wolfhsm/wh_message_crypto.h
Comment thread wolfhsm/wh_message_crypto.h
Comment thread wolfhsm/wh_message_crypto.h
Comment thread examples/posix/wh_posix_cfg.h
Copy link
Copy Markdown
Contributor

@rizlik rizlik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clear PR and docs.
My main feedback/concerns:

a) absence of a way for the client to track the in-flight request. PR acknowledge that, nevertheless it should be one of the main design choices of the API. The PR should either add it here or at least sketch the design in the docs. I don't think offloading the management of the req/resp tracking (is a request in flight? what kind? etc) fully to the application is a good design. At least we should protect the application and error out on bad API usage (eg, two request() without response(), request() and response() of mismatching type, etc).

b) redundancy of the code in wh_client_crypto.c: this is a pre-existing problem for sure, but can we reduce the amount of almost exact code across both sha variants and DMA vs non-DMA paths?

c) API usability: how the app knows the outbound limit? (see comments below). Should we switch on blocking behavior for large input size? or should we add a specific error message? or should we expose the function that compute the outbound limit?

Is using a single API (*Request or a more general *Operation) instead of two (*Request/*Response) worth considering?
It will solve also the cumbersome requestSent (the API can return WH_ERROR_OK directly), and it can address (c) by chunking at the wolfHSM layer big request across multiple client <-> server roundtrips.
In this case the app just keep calling the API until an ret != WH_ERROR_NOTREADY, following a POSIX-like semantic.

Comment thread docs/draft/async-crypto.md Outdated
Comment thread docs/draft/async-crypto.md Outdated
| ML-DSA Verify | `wh_Client_MlDsaVerify{Request,Response}` | Low | Post-quantum; single-shot |
| RNG Generate | `wh_Client_RngGenerate{Request,Response}` | Medium | Chunking needed for large requests; async callers must handle chunking themselves |

Most remaining algorithms are **single-shot** operations (one request, one
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this true also for AES-GCM, AES-CTR?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AES-GCM is one-shot at the moment, as we currently do not support the streaming API of wolfCrypt. However, both AES-CTR and AES-CBC can be used in an incremental streaming-like manner (calling encrypt()/decrypt() multiple times in sequence to stepwise process a large chunk of data). But for that, the internal state of the Aes object is already properly exchanged between client and server (added in #282). So adding the *Request / *Response interface for them should be straightforward still.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be nice to still allow for streaming in cases where the underlying HW accelerator supports it. There might be cases where the customer would want this capability and not be bounded by the wolfssl api.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah not an exhaustive list, sorry. Streaming GCM should definitely be supported eventually

Comment thread src/wh_client_crypto.c
if (in != NULL && inLen > 0) {
uint32_t consumed = 0;
while (ret == WH_ERROR_OK && consumed < inLen) {
uint32_t capacity = _Sha512UpdatePerCallCapacity(sha512);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how the app knows which is the outbound limit? Should it relies on WH_ERROR_BADARGS.
Should the app chunk the data on its own?
I do this is a real problem, this limit can easily go unnoticed by the app in design/testing phase but it can break later in production.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good point. Per our discussion earlier, I think this unfortunately might need to be a per-request client API for those that support a variable/max capacity, since there might not be a uniform way to do it. Thankfully its probably a small API surface. I'll think it it. Might want to punt it though

Copy link
Copy Markdown
Contributor

@Frauschi Frauschi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only smaller issues building on top of Marco's comments.

(((WOLFHSM_CFG_COMM_DATA_LEN - \
(uint32_t)sizeof(whMessageCrypto_GenericRequestHeader) - \
(uint32_t)sizeof(whMessageCrypto_Sha256Request)) / \
64u) * \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use WC_SHA256_BLOCK_SIZE here directly? Would make the code more understandable I think. Same goes for SHA512 below

Copy link
Copy Markdown
Contributor Author

@bigbrett bigbrett Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because I didn't want wh_message_crypto.h to have to #include a wolfCrypt header, which blows up a bunch of stuff in our padding checks. Annoying, but necessary.

Comment thread src/wh_client_crypto.c
(void)wh_Client_DmaProcessClientAddress(
ctx, (uintptr_t)dmaBase, (void**)&inAddr, dmaSz,
WH_DMA_OPER_CLIENT_READ_POST, (whDmaFlags){0});
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should ctx->dma.asyncCtx.sha with its variables be cleared here, too?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second this. The context should be cleared in the case where the caller doesnt check the error here and calls UpdateResponse anyway.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread src/wh_server_crypto.c
memcpy(res.hash, sha512->digest, WC_SHA512_DIGEST_SIZE);
res.loLen = sha512->loLen;
res.hiLen = sha512->hiLen;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

res.hashType is never set. Should be set to the one actually be used for the SHA object. Then the client can check the sent one in the request against the received one in the response to catch misconfiguration (e.g. client is built with SHA512_256 support enabled, server is not; this would cause a mismatch here).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread src/wh_client_crypto.c
ret = _getCryptoResponse(dataPtr, WC_HASH_TYPE_SHA512, (uint8_t**)&res);
if (ret >= 0) {
/* keep hashtype before initialization */
hashType = sha->hashType;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Match this with the hashType received in the response.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

#define WOLFHSM_CFG_ENABLE_CLIENT
#define WOLFHSM_CFG_HEXDUMP
#define WOLFHSM_CFG_COMM_DATA_LEN 5000
#define WOLFHSM_CFG_COMM_DATA_LEN (1024 * 8)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious about why this was increased

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to upstream dilithium required larger messages.

Comment thread src/wh_client_crypto.c
(void)wh_Client_DmaProcessClientAddress(
ctx, (uintptr_t)dmaBase, (void**)&inAddr, dmaSz,
WH_DMA_OPER_CLIENT_READ_POST, (whDmaFlags){0});
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second this. The context should be cleared in the case where the caller doesnt check the error here and calls UpdateResponse anyway.

Comment on lines +92 to +99
/* Poll for completion */
do {
ret = wh_Client_EccSignResponse(ctx, sig, &sigLen);
if (ret == WH_ERROR_NOTREADY) {
/* yield to scheduler, do other work, etc. */
}
} while (ret == WH_ERROR_NOTREADY);
/* ret has final result, sig/sigLen are populated */
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope for this PR but we should think about how we could allow the user to wire up an interrupt+callback approach here instead polling. I.E. somehow fire an interrupt when the client receives the response, have the ISR call the user defined callback.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. Currently nothing prohibits this from being deployed outside the context of wolfHSM if it is baked into the transport - e.g. wrap SHM transport with an interrupt such that client can signal IRQ to server when it "sends" something, then server can _WFI() then poll the transport in the ISR (or defer to background) once it has awoken. And same in reverse order for server->client, though that one would need to somehow communicate which response to poll (or have it be tracked entirely on client side, since only one in flight at a time).

Can think of a few ways to do it. Like you said - food for a future PR.

Comment on lines +408 to +417
### Design Tradeoffs

| Decision | Tradeoff |
|----------|----------|
| **State on wire** | Larger messages (~40-84 bytes overhead), but the server is fully stateless and needs no per-client hash context |
| **Whole-block alignment** | Wastes up to `BLOCK_SIZE - 1` bytes of comm buffer capacity per message, but guarantees the server never has a partial block (simplifies server logic and invariant checking) |
| **Client-side partial buffering** | Requires wolfCrypt's buffer/buffLen fields, but avoids allocating separate storage and enables the `requestSent` optimization for small inputs |
| **Per-call capacity limit** | Callers of the async API must respect the capacity and chunk large inputs themselves (the blocking wrapper handles this automatically), but each call is bounded and predictable |
| **`requestSent` flag** | Adds a parameter to the API, but avoids unnecessary round-trips when input is absorbed entirely into the local buffer |
| **Snapshot/rollback on send failure** | Small CPU cost to copy the partial buffer, but guarantees SHA state consistency even on transport failures |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May just want to sanity check any performance gain/degradation from these changes?

Out of scope for this PR but it would be great to have a CI test to run a benchmark on posix and record performance changes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kinda AI slop, sry.

Imagine there would only be an improvement, since the old blocking method chunked a single SHA block at a time, whereas now you can send much more data in one go. Not sure how it would degrade.

Agreed about automated benchmarks - ideally we would turn on all the X86 optimizations too. Right now its pretty dumb and slow.

| ML-DSA Verify | `wh_Client_MlDsaVerify{Request,Response}` | Low | Post-quantum; single-shot |
| RNG Generate | `wh_Client_RngGenerate{Request,Response}` | Medium | Chunking needed for large requests; async callers must handle chunking themselves |

Most remaining algorithms are **single-shot** operations (one request, one
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be nice to still allow for streaming in cases where the underlying HW accelerator supports it. There might be cases where the customer would want this capability and not be bounded by the wolfssl api.

@bigbrett
Copy link
Copy Markdown
Contributor Author

@rizlik @Frauschi @AlexLanzano Thanks all for the feedback! Believe I addressed all the pending comments in my last few commits - pending the items we discussed punting. The big change is we now track request in flight and have a test for it, making sure the client APIs should error out if misused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants