Conversation
6c7e032 to
7ef52c5
Compare
There was a problem hiding this comment.
Pull request overview
This PR introduces an async (non-blocking) client crypto API for SHA-family operations by splitting each operation into *Request() and *Response() halves, while keeping existing blocking wrappers and the crypto-callback path compatible.
Changes:
- Adds async SHA-224/256/384/512 client APIs (plus DMA async variants) and supporting client DMA async bookkeeping.
- Updates SHA message wire formats to carry intermediate state plus variable-length trailing input, and refactors server handlers to be stateless per request.
- Expands crypto test coverage to exercise large-input, boundary, and async request/response flows; updates coverage tooling flags and POSIX example buffer sizing.
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| wolfhsm/wh_message_crypto.h | Redefines SHA request/DMA request layouts to support variable-length input and inline state; adds per-call inline capacity macros. |
| wolfhsm/wh_client_crypto.h | Adds async Request/Response API declarations for SHA2 variants (and DMA variants), including bool request-sent signaling. |
| wolfhsm/wh_client.h | Adds per-client DMA async context storage for SHA to survive Request/Response boundary for POST cleanup. |
| src/wh_server_crypto.c | Updates server-side SHA/SHA-DMA handlers to consume new layouts, validate inSz, and operate statelessly. |
| src/wh_message_crypto.c | Updates translation functions for new SHA request/DMA request formats and expanded DMA response fields. |
| test/wh_test_crypto.c | Adds extensive new SHA tests (large input, capacity boundaries, async non-DMA + DMA). |
| test/wh_test_check_struct_padding.c | Updates padding/compile checks for renamed SHA DMA request structs. |
| test/Makefile | Adds gcovr parse-error ignore flag for coverage generation. |
| .github/workflows/code-coverage.yml | Mirrors gcovr parse-error ignore flag in CI coverage summary step. |
| examples/posix/wh_posix_server/wolfhsm_cfg.h | Updates example comm buffer size to 8 KiB and clarifies it must match client. |
| examples/posix/wh_posix_client/wolfhsm_cfg.h | Updates example comm buffer size to 8 KiB. |
| examples/posix/wh_posix_cfg.h | Updates SHM req/resp buffer sizing to accommodate larger comm packets. |
| docs/draft/async-crypto.md | Adds design doc describing the async crypto API and SHA wire protocol changes. |
| docs/draft/posix-shm.md | Adds design doc explaining POSIX SHM transport vs DMA feature vs allocator glue. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
wolfSSL-Fenrir-bot
left a comment
There was a problem hiding this comment.
Fenrir Automated Review — PR #337
Scan targets checked: wolfhsm-consttime, wolfhsm-crypto-bugs, wolfhsm-defaults, wolfhsm-mutation, wolfhsm-proptest, wolfhsm-src, wolfhsm-zeroize
Findings: 8
8 finding(s) posted as inline comments (see file-level comments below)
This review was generated automatically by Fenrir. Findings are non-blocking.
There was a problem hiding this comment.
Pull request overview
Adds an asynchronous (non-blocking) request/response API surface for SHA-family crypto operations in wolfHSM, while keeping existing blocking wrappers and the crypto-callback path compatible. This expands the wire protocol to carry resumable hash state + variable-length trailing input, and updates server routing accordingly.
Changes:
- Introduces
*Request()/*Response()async APIs for SHA-256/224/384/512 (DMA and non-DMA variants). - Updates SHA message wire layouts (inline variable-length input, resumable state, new DMA request/response structs) and server-side handling to be stateless per request.
- Adds extensive tests for large inputs and async flows; updates POSIX example buffer sizing/config and coverage tooling/docs.
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| wolfhsm/wh_message_crypto.h | New SHA request wire formats (variable-length trailing input), new SHA DMA request/response structs, inline-capacity macros |
| wolfhsm/wh_client_crypto.h | Declares new async SHA Request/Response APIs (DMA + non-DMA) |
| wolfhsm/wh_client.h | Adds per-client DMA async context storage for POST cleanup across Request/Response |
| src/wh_server_crypto.c | Refactors SHA handlers to consume inSz trailing bytes and enforce invariants; updates SHA DMA handlers for new messages |
| src/wh_message_crypto.c | Updates message translation functions for new SHA structs (incl. DMA) |
| test/wh_test_crypto.c | Adds reference hashing + large-input and async tests for SHA variants (and DMA async tests under DMA) |
| test/wh_test_check_struct_padding.c | Updates padding-check declarations for new SHA DMA request structs |
| examples/posix/wh_posix_{client,server}/wolfhsm_cfg.h | Sets comm data length to 8KiB for POSIX examples |
| examples/posix/wh_posix_cfg.h | Increases SHM transport request/response buffer sizes to match larger comm payloads |
| docs/draft/async-crypto.md | Design doc describing async crypto API and SHA wire protocol |
| docs/draft/posix-shm.md | Design doc explaining POSIX SHM transport + DMA layering |
| test/Makefile | Adds gcovr parse-error ignore option to coverage target |
| .github/workflows/code-coverage.yml | Mirrors the gcovr ignore option in CI coverage summary step |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
rizlik
left a comment
There was a problem hiding this comment.
Thanks for the clear PR and docs.
My main feedback/concerns:
a) absence of a way for the client to track the in-flight request. PR acknowledge that, nevertheless it should be one of the main design choices of the API. The PR should either add it here or at least sketch the design in the docs. I don't think offloading the management of the req/resp tracking (is a request in flight? what kind? etc) fully to the application is a good design. At least we should protect the application and error out on bad API usage (eg, two request() without response(), request() and response() of mismatching type, etc).
b) redundancy of the code in wh_client_crypto.c: this is a pre-existing problem for sure, but can we reduce the amount of almost exact code across both sha variants and DMA vs non-DMA paths?
c) API usability: how the app knows the outbound limit? (see comments below). Should we switch on blocking behavior for large input size? or should we add a specific error message? or should we expose the function that compute the outbound limit?
Is using a single API (*Request or a more general *Operation) instead of two (*Request/*Response) worth considering?
It will solve also the cumbersome requestSent (the API can return WH_ERROR_OK directly), and it can address (c) by chunking at the wolfHSM layer big request across multiple client <-> server roundtrips.
In this case the app just keep calling the API until an ret != WH_ERROR_NOTREADY, following a POSIX-like semantic.
| | ML-DSA Verify | `wh_Client_MlDsaVerify{Request,Response}` | Low | Post-quantum; single-shot | | ||
| | RNG Generate | `wh_Client_RngGenerate{Request,Response}` | Medium | Chunking needed for large requests; async callers must handle chunking themselves | | ||
|
|
||
| Most remaining algorithms are **single-shot** operations (one request, one |
There was a problem hiding this comment.
is this true also for AES-GCM, AES-CTR?
There was a problem hiding this comment.
AES-GCM is one-shot at the moment, as we currently do not support the streaming API of wolfCrypt. However, both AES-CTR and AES-CBC can be used in an incremental streaming-like manner (calling encrypt()/decrypt() multiple times in sequence to stepwise process a large chunk of data). But for that, the internal state of the Aes object is already properly exchanged between client and server (added in #282). So adding the *Request / *Response interface for them should be straightforward still.
There was a problem hiding this comment.
I think it could be nice to still allow for streaming in cases where the underlying HW accelerator supports it. There might be cases where the customer would want this capability and not be bounded by the wolfssl api.
There was a problem hiding this comment.
yeah not an exhaustive list, sorry. Streaming GCM should definitely be supported eventually
| if (in != NULL && inLen > 0) { | ||
| uint32_t consumed = 0; | ||
| while (ret == WH_ERROR_OK && consumed < inLen) { | ||
| uint32_t capacity = _Sha512UpdatePerCallCapacity(sha512); |
There was a problem hiding this comment.
how the app knows which is the outbound limit? Should it relies on WH_ERROR_BADARGS.
Should the app chunk the data on its own?
I do this is a real problem, this limit can easily go unnoticed by the app in design/testing phase but it can break later in production.
There was a problem hiding this comment.
this is a good point. Per our discussion earlier, I think this unfortunately might need to be a per-request client API for those that support a variable/max capacity, since there might not be a uniform way to do it. Thankfully its probably a small API surface. I'll think it it. Might want to punt it though
Frauschi
left a comment
There was a problem hiding this comment.
Only smaller issues building on top of Marco's comments.
| (((WOLFHSM_CFG_COMM_DATA_LEN - \ | ||
| (uint32_t)sizeof(whMessageCrypto_GenericRequestHeader) - \ | ||
| (uint32_t)sizeof(whMessageCrypto_Sha256Request)) / \ | ||
| 64u) * \ |
There was a problem hiding this comment.
Why not use WC_SHA256_BLOCK_SIZE here directly? Would make the code more understandable I think. Same goes for SHA512 below
There was a problem hiding this comment.
because I didn't want wh_message_crypto.h to have to #include a wolfCrypt header, which blows up a bunch of stuff in our padding checks. Annoying, but necessary.
| (void)wh_Client_DmaProcessClientAddress( | ||
| ctx, (uintptr_t)dmaBase, (void**)&inAddr, dmaSz, | ||
| WH_DMA_OPER_CLIENT_READ_POST, (whDmaFlags){0}); | ||
| } |
There was a problem hiding this comment.
Should ctx->dma.asyncCtx.sha with its variables be cleared here, too?
There was a problem hiding this comment.
Second this. The context should be cleared in the case where the caller doesnt check the error here and calls UpdateResponse anyway.
| memcpy(res.hash, sha512->digest, WC_SHA512_DIGEST_SIZE); | ||
| res.loLen = sha512->loLen; | ||
| res.hiLen = sha512->hiLen; | ||
| } |
There was a problem hiding this comment.
res.hashType is never set. Should be set to the one actually be used for the SHA object. Then the client can check the sent one in the request against the received one in the response to catch misconfiguration (e.g. client is built with SHA512_256 support enabled, server is not; this would cause a mismatch here).
| ret = _getCryptoResponse(dataPtr, WC_HASH_TYPE_SHA512, (uint8_t**)&res); | ||
| if (ret >= 0) { | ||
| /* keep hashtype before initialization */ | ||
| hashType = sha->hashType; |
There was a problem hiding this comment.
Match this with the hashType received in the response.
| #define WOLFHSM_CFG_ENABLE_CLIENT | ||
| #define WOLFHSM_CFG_HEXDUMP | ||
| #define WOLFHSM_CFG_COMM_DATA_LEN 5000 | ||
| #define WOLFHSM_CFG_COMM_DATA_LEN (1024 * 8) |
There was a problem hiding this comment.
just curious about why this was increased
There was a problem hiding this comment.
Changes to upstream dilithium required larger messages.
| (void)wh_Client_DmaProcessClientAddress( | ||
| ctx, (uintptr_t)dmaBase, (void**)&inAddr, dmaSz, | ||
| WH_DMA_OPER_CLIENT_READ_POST, (whDmaFlags){0}); | ||
| } |
There was a problem hiding this comment.
Second this. The context should be cleared in the case where the caller doesnt check the error here and calls UpdateResponse anyway.
| /* Poll for completion */ | ||
| do { | ||
| ret = wh_Client_EccSignResponse(ctx, sig, &sigLen); | ||
| if (ret == WH_ERROR_NOTREADY) { | ||
| /* yield to scheduler, do other work, etc. */ | ||
| } | ||
| } while (ret == WH_ERROR_NOTREADY); | ||
| /* ret has final result, sig/sigLen are populated */ |
There was a problem hiding this comment.
Out of scope for this PR but we should think about how we could allow the user to wire up an interrupt+callback approach here instead polling. I.E. somehow fire an interrupt when the client receives the response, have the ISR call the user defined callback.
There was a problem hiding this comment.
agree. Currently nothing prohibits this from being deployed outside the context of wolfHSM if it is baked into the transport - e.g. wrap SHM transport with an interrupt such that client can signal IRQ to server when it "sends" something, then server can _WFI() then poll the transport in the ISR (or defer to background) once it has awoken. And same in reverse order for server->client, though that one would need to somehow communicate which response to poll (or have it be tracked entirely on client side, since only one in flight at a time).
Can think of a few ways to do it. Like you said - food for a future PR.
| ### Design Tradeoffs | ||
|
|
||
| | Decision | Tradeoff | | ||
| |----------|----------| | ||
| | **State on wire** | Larger messages (~40-84 bytes overhead), but the server is fully stateless and needs no per-client hash context | | ||
| | **Whole-block alignment** | Wastes up to `BLOCK_SIZE - 1` bytes of comm buffer capacity per message, but guarantees the server never has a partial block (simplifies server logic and invariant checking) | | ||
| | **Client-side partial buffering** | Requires wolfCrypt's buffer/buffLen fields, but avoids allocating separate storage and enables the `requestSent` optimization for small inputs | | ||
| | **Per-call capacity limit** | Callers of the async API must respect the capacity and chunk large inputs themselves (the blocking wrapper handles this automatically), but each call is bounded and predictable | | ||
| | **`requestSent` flag** | Adds a parameter to the API, but avoids unnecessary round-trips when input is absorbed entirely into the local buffer | | ||
| | **Snapshot/rollback on send failure** | Small CPU cost to copy the partial buffer, but guarantees SHA state consistency even on transport failures | |
There was a problem hiding this comment.
May just want to sanity check any performance gain/degradation from these changes?
Out of scope for this PR but it would be great to have a CI test to run a benchmark on posix and record performance changes
There was a problem hiding this comment.
This is kinda AI slop, sry.
Imagine there would only be an improvement, since the old blocking method chunked a single SHA block at a time, whereas now you can send much more data in one go. Not sure how it would degrade.
Agreed about automated benchmarks - ideally we would turn on all the X86 optimizations too. Right now its pretty dumb and slow.
| | ML-DSA Verify | `wh_Client_MlDsaVerify{Request,Response}` | Low | Post-quantum; single-shot | | ||
| | RNG Generate | `wh_Client_RngGenerate{Request,Response}` | Medium | Chunking needed for large requests; async callers must handle chunking themselves | | ||
|
|
||
| Most remaining algorithms are **single-shot** operations (one request, one |
There was a problem hiding this comment.
I think it could be nice to still allow for streaming in cases where the underlying HW accelerator supports it. There might be cases where the customer would want this capability and not be bounded by the wolfssl api.
|
@rizlik @Frauschi @AlexLanzano Thanks all for the feedback! Believe I addressed all the pending comments in my last few commits - pending the items we discussed punting. The big change is we now track request in flight and have a test for it, making sure the client APIs should error out if misused. |
Async Crypto API for SHA
Introduces a non-blocking request/response split for wolfHSM crypto operations, starting with the SHA family. Every blocking call gains a
*Request()/*Response()pair;Request()sends and returns immediately,Response()polls once and returnsWH_ERROR_NOTREADYuntil the server replies. Existing blocking wrappers and thewh_Client_CryptoCbpath are unchanged, so current application code keeps working.Summary
wh_client_crypto.hcovering SHA-1/224/256/384/512wh_client_crypto.caround shared request/response primitiveswh_message_crypto.{c,h}to carry async-specific state (intermediate hash state, trailing input, DMA metadata)wh_server_crypto.c(requests handled statelessly per design)docs/draft/async-crypto.md(includes roadmap for remaining algorithms)test/wh_test_crypto.ccode-coverage.ymlFuture work