-
Notifications
You must be signed in to change notification settings - Fork 1
Fix leaks #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix leaks #26
Conversation
…ng/array) by filters/tests (#19147) * undefined is treated as iterable (string/array) by filters `tojson` is not a supported `undefined` filter * add tests * add sequence and iterable tests keep it DRY and fix some types
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
The syclcompat/math.hpp is not used anymore. The change that intrduced it was successfuly reverted (ggml-org/llama.cpp#17826). This include path will become obsolete and dropped in oneAPI 2026.0 effectively breaking ggml-sycl builds.
* convert : yield Mamba2Model/GraniteMoeModel modify_tensors This commit updates the `GraniteHybridModel` class' modify_tensors function to properly delegate to `Mamba2Model.modify_tensors` and `GraniteMoeModel.modify_tensors` using 'yield from' instead of 'return'. The motivation for this is that modify_tensors is a generator function (it uses 'yield from'), but the two calls above use return statements but don't yield anything which means that the the caller of this function will not receive any yielded values from it. And this causes layer tensors to be silently dropped during conversion.
|
/test topsail remoting_mac |
|
🔴 Test of 'mac_ai test prepare_ci' failed after 00 hours 02 minutes 39 seconds. 🔴 • Link to the test results. • No reports index generated... Test configuration: Failure indicator: Empty. (See run.log) |
|
/test topsail mac-m4 remoting_mac |
|
🔴 Test of 'mac_ai test prepare_ci' failed after 00 hours 02 minutes 38 seconds. 🔴 • Link to the test results. • No reports index generated... Test configuration: Failure indicator: Empty. (See run.log) |
|
/test topsail mac-m4 remoting_mac |
…ctor (#18471) * server: introduce self-speculative decoding * server: moved self-call into speculative.cpp * can_speculate() includes self-speculation Co-authored-by: Georgi Gerganov <[email protected]> * server: can_speculate() tests self-spec * server: replace can_speculate() with slot.can_speculate() Co-authored-by: Sigbjørn Skjæret <[email protected]> * common: use %zu format specifier for size_t in logging Co-authored-by: Sigbjørn Skjæret <[email protected]> * server: can_speculate() requires a task instance * common: ngram map, config self-speculative decoding * common: add enum common_speculative_type * common: add vector of speculative states * common: add option --spec-draftless * server: cleanup (remove slot.batch_spec, rename) * common: moved self-spec impl to ngram-map * common: cleanup (use common_speculative_state_draft) * spec : refactor * cont : naming * spec: remove --spec-config * doc: (draftless) speculative decoding * common: print performance in spec decoding * minor : cleanup * common : better names * minor : cleanup + fix build * minor: comments * CODEOWNERS: add common/ngram-map.* (#18471) * common : rename speculative.draftless_type -> speculative.type * ngram-map : fix uninitialized values * ngram-map : take into account the input can become shorter * ngram-map : revert len check for now * arg : change `--spec-draftless` -> `--spec-type` * spec : add common_speculative_state::accept() * spec : refactor + add common_speculative_begin() * spec : fix begin() call with mtmd * spec : additional refactor + remove common_speculative_params --------- Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]>
|
🔴 Test of 'mac_ai test prepare_ci' failed after 00 hours 23 minutes 01 seconds. 🔴 • Link to the test results. • No reports index generated... Test configuration: |
* vulkan: use coopmat for flash attention p*v matrix multiplication * fix P loading issue * fix barrier position * remove reduction that is no longer needed * move max thread reduction into loop * remove osh padding * add bounds checks and padding * remove unused code * fix shmem sizes, loop duration and accesses * don't overwrite Qf, add new shared psh buffer instead * add missing bounds checks * use subgroup reductions * optimize * move bounds check, reduce barriers * support other Bc values and other subgroup sizes * remove D_split * replace Of register array with shared memory Ofsh array * parallelize HSV across the rowgroups * go back to Of in registers, not shmem * vectorize sfsh * don't store entire K tile in shmem * fixes * load large k tiles to shmem on Nvidia * adapt shared memory host check function to shader changes * remove Bc 32 case * remove unused variable * fix missing mask reduction tmspsh barrier * fix mask bounds check * fix rowmax f16 under/overflow to inf * fix flash_attn_cm2 BLOCK_SIZE preprocessor directives
WalkthroughCaches device and buffer-type metadata in virtgpu, migrates buffer-type APIs to use host handles, deprecates buffer_type.is_host, introduces thread-safe shared-memory locking, updates APIR encode/decode helpers, and centralizes forward/backend buffer operations around apir_buffer_type_host_handle_t. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Forward as VirtGPU-Forward
participant Backend as APIR-Backend
participant GPU as virtgpu
Client->>Forward: apir_buffer_type_get_max_size(host_handle)
Forward->>Backend: encode host_handle, send command
Backend->>GPU: check gpu->cached_buffer_type.host_handle / iface
alt GPU iface has get_max_size
GPU->>Backend: call iface.get_max_size(...) -> max_size
else GPU missing handler
GPU->>Backend: return SIZE_MAX
end
Backend->>Forward: encode result (max_size)
Forward->>Client: return max_size
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
/test topsail mac-m4 remoting_mac |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
ggml/src/ggml-virtgpu/virtgpu.cpp (1)
484-487: Changeresponse_dectothread_localto avoid data races in concurrent calls.The
response_decvariable at line 484 isstaticwhileencoder_bufferandencare alreadythread_local. If multiple threads callremote_callconcurrently, they will share the same decoder object, causing one thread's initialization to overwrite another's before use.Proposed fix
- static apir_decoder response_dec; + thread_local apir_decoder response_dec;ggml/src/ggml-virtgpu/virtgpu-forward-buffer-type.cpp (1)
3-25: Confirm memory leak inapir_buffer_type_get_name— allocated string is never freed.The function allocates memory via
apir_decoder_alloc_array()at line 15 and returns a pointer to this allocated string. Inggml-backend-reg.cppline 36, this pointer is stored ingpu->cached_buffer_type.name, which persists as a static structure for the program's lifetime. Sincevirtgpuis a plain C struct with no destructor, and no cleanup code exists to freecached_buffer_type.name, this results in a memory leak whenever the backend is initialized.
🤖 Fix all issues with AI agents
In `@ggml/src/ggml-virtgpu/backend/shared/apir_backend.gen.h`:
- Around line 19-34: The enum numeric IDs for APIR_COMMAND_TYPE_* have been
shifted which will break wire protocol clients; restore stability by reserving
any removed numeric slot(s) instead of renumbering (e.g., add a placeholder like
APIR_COMMAND_TYPE_RESERVED_x for the missing value(s)) or, alternatively, bump
and enforce a protocol version/handshake; update the enum (symbols such as
APIR_COMMAND_TYPE_BUFFER_TYPE_ALLOC_BUFFER, APIR_COMMAND_TYPE_BUFFER_GET_BASE,
APIR_COMMAND_TYPE_BACKEND_GRAPH_COMPUTE) to keep existing numeric values
unchanged and adjust APIR_BACKEND_DISPATCH_TABLE_COUNT accordingly to reflect
the true count while preserving historic IDs.
In `@ggml/src/ggml-virtgpu/ggml-backend-device.cpp`:
- Around line 98-116: Replace the non-atomic initialized flag with a
std::atomic<bool> and use proper acquire/release semantics to fix the
double-checked locking in the block that constructs the static
ggml_backend_buffer_type buft; specifically change the static bool initialized
to static std::atomic<bool> initialized{false}, read it with
initialized.load(std::memory_order_acquire) for the first check, take the
std::mutex and construct/assign buft as now, then set initialized.store(true,
std::memory_order_release) after initialization; keep the existing mutex,
lock_guard and the buft construction (iface/device/context) logic unchanged.
- Around line 75-92: The code uses a non-atomic static bool initialized and a
static ggml_backend_buffer_type buft, creating a data race in the double-checked
locking and incorrectly binding the first-call dev into a single static buft;
fix by making initialized an std::atomic<bool> (e.g. std::atomic<bool>
initialized{false}) and use appropriate memory_order (load with
memory_order_acquire, store with memory_order_release) around the
mutex-protected initialization; additionally avoid storing a single static buft
per-process when different dev values may be passed—either make buft non-static
(return a fresh ggml_backend_buffer_type) or maintain a small thread-safe map
keyed by dev to return a distinct buft for each dev (referencing the symbols
initialized, buft, dev, and the mutex lock/initialization block).
In `@ggml/src/ggml-virtgpu/ggml-backend-reg.cpp`:
- Around line 34-38: apir_buffer_type_get_name may return NULL on allocation
failure, so update the code that sets gpu->cached_buffer_type.name to check the
return value from apir_buffer_type_get_name(gpu, buft_host_handle) and use a
safe fallback (e.g., an empty string or strdup(\"\")) when NULL is returned;
ensure the chosen fallback matches how other device info strings are handled and
that ownership/cleanup of gpu->cached_buffer_type.name remains consistent with
the rest of the code (refer to apir_buffer_type_get_name and
gpu->cached_buffer_type.name to locate the change).
- Around line 24-38: apir_device_get_description and apir_device_get_name can
return NULL; update the cache population in the block that assigns
gpu->cached_device_info.description and .name to check the returned pointers
from apir_device_get_description(gpu) and apir_device_get_name(gpu) and, if
NULL, set gpu->cached_device_info.description/name to a safe fallback (e.g. an
empty string or strdup of "(unknown)") instead of storing NULL; ensure any
temporary allocated strings are handled consistently with existing ownership
semantics used elsewhere for cached_device_info to avoid leaks.
In `@ggml/src/ggml-virtgpu/virtgpu.h`:
- Around line 76-78: The data_shmem_mutex initialized via mtx_init in
apir_initialize/virtgpu.cpp is never destroyed; add a call to
mtx_destroy(&gpu->data_shmem_mutex) in the GPU cleanup path (the function that
frees/destroys the GPU object created in apir_initialize) so the mutex is
properly released during GPU teardown; ensure the mtx_destroy call is paired
with the existing initialization and placed before or as part of releasing the
GPU structure.
🧹 Nitpick comments (3)
ggml/src/ggml-virtgpu/ggml-remoting.h (1)
48-51: Misleading comment for frontend helper.The comment on line 49 says "in the backend, the buffer handle is the buffer pointer" but this is the frontend version that extracts
buft->context. The backend version inbackend/backend-convert.hcastsbuftdirectly. Consider updating the comment to reflect the frontend semantics:static inline apir_buffer_type_host_handle_t ggml_buffer_type_to_apir_handle(ggml_backend_buffer_type_t buft) { - // in the backend, the buffer handle is the buffer pointer + // in the frontend, the host handle is stored in buft->context return (apir_buffer_type_host_handle_t) buft->context; }ggml/src/ggml-virtgpu/virtgpu-forward-backend.cpp (1)
50-55: Consider checkingmtx_unlockreturn value and handling error paths.The
mtx_unlockreturn value is not checked. While unlock failures are rare, logging or asserting could help diagnose issues.More importantly, if an error occurs between acquiring the lock (line 25) and releasing it (line 52), such as if
REMOTE_CALLmacro causes an early return or abort, the mutex could remain locked. Consider using RAII-style lock guards if C++ mutex primitives are available.♻️ Optional: Use RAII lock guard pattern
If mixing C++ and C threading primitives is acceptable, consider wrapping the C mutex in a simple RAII guard:
// Simple RAII wrapper for mtx_t struct MtxGuard { mtx_t * mtx; bool locked; MtxGuard(mtx_t * m) : mtx(m), locked(false) { if (mtx_lock(mtx) == thrd_success) locked = true; } ~MtxGuard() { if (locked) mtx_unlock(mtx); } };ggml/src/ggml-virtgpu/virtgpu.h (1)
80-88: Cached memory values may become stale.
memory_freeandmemory_totalare cached at initialization but may change during runtime as allocations occur. If accurate memory reporting is required, consider whether these should remain dynamic queries or document that they represent initial values.
|
🟢 Test of 'mac_ai test test_ci' succeeded after 00 hours 26 minutes 24 seconds. 🟢 • Link to the test results. • Link to the reports index. Test configuration: |
…t to support more cases (#19154) Co-authored-by: Neo Zhang Jianyu <[email protected]>
|
/test topsail mac-m4 remoting_mac |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ggml/src/ggml-virtgpu/regenerate_remoting.py (1)
299-301: Missing newline or space in warning message.The warning message is missing proper separation between the two sentences.
🐛 Proposed fix
if not self.clang_format_available: - logging.warning("\n⚠️clang-format not found in PATH. Generated files will not be formatted." - " Install clang-format to enable automatic code formatting.") + logging.warning("\n⚠️ clang-format not found in PATH. Generated files will not be formatted.\n" + " Install clang-format to enable automatic code formatting.")
🧹 Nitpick comments (5)
ggml/src/ggml-virtgpu/backend/shared/apir_cs_ggml.h (1)
40-57: Consider moving the null check before initializingggml_init_params.The null guard at lines 48-50 is a good addition to handle cases where
apir_rpc_tensoris NULL. However, theggml_init_paramsstruct is constructed before the check. While this is harmless (stack allocation), moving the check earlier would be slightly cleaner.♻️ Proposed refactor
static inline const ggml_tensor * apir_decode_ggml_tensor(apir_decoder * dec) { const apir_rpc_tensor * apir_rpc_tensor = apir_decode_apir_rpc_tensor_inplace(dec); + + if (!apir_rpc_tensor) { + return NULL; + } + ggml_init_params params{ /*.mem_size =*/ ggml_tensor_overhead(), /*.mem_buffer =*/ NULL, /*.no_alloc =*/ true, }; - if (!apir_rpc_tensor) { - return NULL; - } - ggml_context * ctx = ggml_init(params);ggml/src/ggml-virtgpu/virtgpu-forward-device.cpp (2)
24-43: Return type change transfers ownership to caller - document this.The change from
const char *tochar *indicates the caller now owns the returned memory and is responsible for freeing it. This aligns with the PR objective to fix leaks.Consider adding a documentation comment to clarify the ownership semantics:
📝 Proposed documentation
+/** + * Returns the device name. Caller takes ownership and must free() the returned string. + * Returns NULL on allocation failure. + */ char * apir_device_get_name(virtgpu * gpu) {
45-66: Same ownership semantics apply here.The description getter has the same ownership transfer. Consider adding similar documentation.
📝 Proposed documentation
+/** + * Returns the device description. Caller takes ownership and must free() the returned string. + * Returns NULL on allocation failure. + */ char * apir_device_get_description(virtgpu * gpu) {ggml/src/ggml-virtgpu/ggml-backend-reg.cpp (1)
176-183: Make cleanup idempotent to avoid accidental double-free.
Ifggml_virtgpu_cleanupcan be called more than once (or on a partially-initialized GPU), guard and null out freed pointers.♻️ Suggested hardening
void ggml_virtgpu_cleanup(virtgpu *gpu) { + if (!gpu) { + return; + } - free(gpu->cached_device_info.name); - free(gpu->cached_device_info.description); - free(gpu->cached_buffer_type.name); + if (gpu->cached_device_info.name) { + free(gpu->cached_device_info.name); + gpu->cached_device_info.name = NULL; + } + if (gpu->cached_device_info.description) { + free(gpu->cached_device_info.description); + gpu->cached_device_info.description = NULL; + } + if (gpu->cached_buffer_type.name) { + free(gpu->cached_buffer_type.name); + gpu->cached_buffer_type.name = NULL; + } mtx_destroy(&gpu->data_shmem_mutex); }ggml/src/ggml-virtgpu/virtgpu-forward.gen.h (1)
6-8: Document ownership for returnedchar *strings.
With non-const return types, callers need to know they must free the returned buffers. A brief header comment would prevent leaks.Also applies to: 20-28
|
🟢 Test of 'mac_ai test test_ci' succeeded after 00 hours 26 minutes 19 seconds. 🟢 • Link to the test results. • Link to the reports index. Test configuration: |
|
/test topsail kpouget-atuona remoting_linux |
* refactor mmf rows_per_block * speed up compile * pass cdna compile * fix cuda error * clean up mmf * f32 mmf * clean float mma * fix mmf error * faster mmf * extend tile k * fix compile error * Revert "extend tile k" This reverts commit 4d2ef3d483932659801a59a5af0b6b48f6ffd5c7. * fix smem overflow * speed up compiling mmf * speed up compile for hip * 512 block for cdna * config pad size * fix as comment * update select logic * move some code to cuh * fix as comment * correct cdna3 config --------- Co-authored-by: zhang hui <[email protected]>
|
🔴 Test of 'mac_ai test test_ci' failed after 02 hours 28 minutes 59 seconds. 🔴 • Link to the test results. • No reports index generated... Test configuration: Failure indicator: Empty. (See run.log) |
* chat : add parsing for solar-open-100b * add comments to rules * cont : make assistant start optional * cont : remove assistant start prefix altogether --------- Co-authored-by: Piotr Wilkin (ilintar) <[email protected]>
|
🟢 Test of 'mac_ai test test_ci' succeeded after 00 hours 25 minutes 49 seconds. 🟢 • Link to the test results. • Link to the reports index. Test configuration: |
|
/test topsail mac-m4 remoting_mac |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 13
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/test-chat-template.cpp (1)
81-82: Bug:--no-commonflag setsuse_commonto wrong value.The
--no-commonflag should setuse_common = false, but it currently sets it totrue. This means the flag has no effect (sinceuse_commonis already initialized totrueon line 67).🐛 Proposed fix
} else if (args[i] == "--no-common") { - use_common = true; + use_common = false;
🤖 Fix all issues with AI agents
In @.github/workflows/winget.yml:
- Around line 31-35: The current destructuring of the found release/asset can
throw if no release or no asset includes 'win-vulkan'; update the logic around
the releases.find call (the variable currently destructured into version and
assets) to first store the foundRelease, guard that it is not undefined, then
find the matching asset (the assets.find that yields asset_url) and guard that
as well; if either is missing, call core.setFailed or throw a clear Error with a
message like "No release with win-vulkan asset found" so the workflow fails fast
with a readable message instead of crashing during destructuring.
In `@common/arg.cpp`:
- Around line 3398-3417: The CLI option "--spec-type" accepts only hyphenated
values but common_speculative_type_to_str and the internal enum/constants
(COMMON_SPECULATIVE_TYPE_NONE, COMMON_SPECULATIVE_TYPE_NGRAM_CACHE,
COMMON_SPECULATIVE_TYPE_NGRAM_SIMPLE, COMMON_SPECULATIVE_TYPE_NGRAM_MAP_K,
COMMON_SPECULATIVE_TYPE_NGRAM_MAP_K4V) use underscore names, causing defaults
shown by common_speculative_type_to_str to be non-pastable; update the lambda
passed to common_arg/add_opt for "--spec-type" to normalize input by mapping
both underscore and hyphen variants (e.g., accept "ngram_map_k" and
"ngram-map-k" or translate underscores to hyphens before comparison) so
params.speculative.type is set correctly and help/default strings can be reused
by the CLI.
In `@common/common.h`:
- Around line 255-299: The has_dft() helper only checks mparams_dft.path and
mparams_dft.hf_repo and thus misses other draft-model entry points; update
has_dft() to also return true when a draft model is provided via
mparams_dft.url, mparams_dft.hf_file, mparams_dft.docker_repo, mparams_dft.name
(or any other identifying fields on common_params_model), and when model_dft !=
nullptr; locate the function has_dft() in struct common_params_speculative and
add checks for these additional mparams_dft members and the model_dft pointer so
any of those sources signal a present draft model.
In `@common/ngram-map.cpp`:
- Around line 195-211: When creating a simple key-only draft in the block
guarded by if (map.key_only) (where draft is filled and map.last_draft_key_idx /
map.last_draft_value_idx are set), also set map.last_draft_created = true so
that common_ngram_map_accept sees the draft and updates n_accepted; update the
assignment that currently sets last_draft_created to false to true and keep the
other last_draft_* assignments unchanged.
- Around line 314-329: The draft is currently extracted from match_pos while
slot_max may reference a different value; update the draft extraction in the
block that computes n_draft_tokens/draft so it uses the selected value's
position (e.g., curr_key.values[slot_max].value_idx or equivalent) instead of
match_pos, and guard against a zero/invalid value_idx before indexing inp; also
ensure map.last_draft_value_idx and n_accepted
(curr_key.values[slot_max].n_accepted) then correctly describe the produced
draft (adjust match offset computation and bounds checks accordingly).
In `@common/speculative.cpp`:
- Around line 247-260: Validate the result of re-tokenizing `text` before
assigning back to `id_last`: after calling common_tokenize(ctx_dft, text, false,
true) (used to remap the detokenized `id_last`), check the returned vector size
and only set `id_last` from the single token when the size == 1; if the result
is empty or contains >1 tokens, log a debug/warn via LOG_DBG/LOG_WARN and skip
drafting/keep the original `id_last` (or handle per existing draft-skip logic)
to avoid out-of-bounds access and incorrect multi-token remaps; update the block
around llama_detokenize, text = replace_to_dft(text), and the subsequent
assignment to `id_last` accordingly.
In `@docs/speculative.md`:
- Around line 55-105: Add fenced-code languages and fix table style and
hyphenation: update the code fences that show CLI options to use a language tag
(e.g., ```text) and ensure closing fences are present, convert the options table
to the GitHub-style pipe table with header separator (|Type|Description| and
|---|---|) replacing the mismatched style, and change occurrences of "map based"
to "map-based" in the option descriptions for --spec-ngram-size-n and
--spec-ngram-size-m (and the duplicate text around lines 109-113) so
markdownlint MD040/MD060 and the hyphenation issue are resolved.
In `@ggml/src/ggml-cuda/ggml-cuda.cu`:
- Around line 3083-3089: The ggml_cuda_topk_moe_fusion function leaves
args.scale uninitialized causing nondeterministic fusion paths; explicitly set
args.scale (a member of ggml_cuda_topk_moe_args) when initializing the args
struct inside ggml_cuda_topk_moe_fusion so it has a defined value (e.g., false
or 1.0 as appropriate for your semantics) to prevent mis-detection of a SCALE op
and subsequent out-of-bounds node indexing.
In `@ggml/src/ggml-virtgpu/backend/shared/apir_cs_ggml.h`:
- Around line 40-45: apir_decode_ggml_tensor can return NULL so update each call
site—backend_buffer_set_tensor, backend_buffer_get_tensor, and
backend_buffer_cpy_tensor—to check the returned const ggml_tensor* immediately
after the apir_decode_ggml_tensor(...) call and handle NULL by logging an error
and returning early with an appropriate error code (or return value) instead of
dereferencing; in backend_buffer_cpy_tensor check both src and dst results
separately and return on either NULL to avoid use-after-null.
In `@ggml/src/ggml-virtgpu/ggml-backend-reg.cpp`:
- Around line 4-12: The file uses std::atomic<bool> in the apir_initialize
function but doesn't include <atomic>; add an explicit `#include` <atomic> at the
top of ggml-backend-reg.cpp so the use of std::atomic in apir_initialize is
valid without relying on transitive includes, keeping the include near the other
standard headers (e.g., alongside <mutex> and <iostream>).
In `@ggml/src/ggml-virtgpu/ggmlremoting_functions.yaml`:
- Around line 26-31: The functions get_name and get_description currently return
dynamically allocated char * via apir_decoder_alloc_array() but lack
ownership/lifetime documentation; either (a) add clear comments/doxygen to the
generated declarations for get_name and get_description stating that the caller
receives ownership and must free the buffer (specify free function if any), or
(b) change the API to avoid per-call allocation by returning const char * to a
cached/static buffer (or expose a separate function to obtain/duplicate the
string) and update uses of apir_decoder_alloc_array() accordingly; locate
get_name/get_description in the YAML and the allocation site
apir_decoder_alloc_array() to implement one of these two fixes.
In `@tools/server/server-context.cpp`:
- Around line 749-759: When common_speculative_init(params_base.speculative,
slot.ctx) returns a non-null slot.spec while mctx is set, the function returns
early and leaks that allocated spec; modify the error path in the block around
common_speculative_init so that before returning false you free or destroy the
newly created slot.spec (e.g., call the corresponding free/destroy function used
for speculative contexts) or move the slot into the slots container so destroy()
will clean it up; update the early-return path where SRV_ERR is logged to ensure
slot.spec is properly released to avoid orphaned memory (referencing slot.spec,
common_speculative_init, mctx, slots, and destroy()).
In `@tools/server/server-task.cpp`:
- Around line 256-266: Fix two issues in the speculative parameter parsing: (1)
correct the clamping logic for ngram_size_n, ngram_size_m, ngram_check_rate, and
ngram_min_hits by using std::min(std::max(1, x), 1024) instead of the current
std::max(std::min(1, x), 1024) so values are clamped into [1,1024]; (2) handle
unknown speculative type returned by common_speculative_type_from_name by
checking if the result equals COMMON_SPECULATIVE_TYPE_COUNT and, if so, set
params.speculative.type to defaults.speculative.type (or otherwise log/validate)
instead of assigning the invalid enum directly. Ensure you update the code
around params.speculative.* assignments and the call to
common_speculative_type_from_name accordingly.
🧹 Nitpick comments (1)
examples/lookup/lookup.cpp (1)
213-213: Consider guarding the save operation when dynamic cache path is empty.If
params.speculative.lookup_cache_dynamicis empty,common_ngram_cache_savewill be called with an empty filename. Depending oncommon_ngram_cache_save's behavior, this could create an empty-named file or fail silently.♻️ Suggested guard
// Update dynamic ngram cache with context ngram cache and save it to disk: common_ngram_cache_merge(ngram_cache_dynamic, ngram_cache_context); - common_ngram_cache_save(ngram_cache_dynamic, params.speculative.lookup_cache_dynamic); + if (!params.speculative.lookup_cache_dynamic.empty()) { + common_ngram_cache_save(ngram_cache_dynamic, params.speculative.lookup_cache_dynamic); + }
| if (map.key_only) { | ||
| // simple mode: | ||
| // Fill in the draft with the m tokens following the key. | ||
| // We work with value values[0] only. | ||
| int n_draft_tokens = std::min((int) m, (int) curr_key.values[0].n_accepted); | ||
|
|
||
| for (int i = 0; i < n_draft_tokens; ++i) { | ||
| draft.push_back(inp[match_pos + n + i]); | ||
| } | ||
|
|
||
| LOG_INF("%s: key_offset = %zu, key_num = %d, draft.size = %zu\n", __func__, | ||
| key_offset, curr_key.key_num, draft.size()); | ||
|
|
||
| map.last_draft_created = false; | ||
| map.last_draft_key_idx = key_offset; | ||
| map.last_draft_value_idx = 0; // value 0 is used for simple mode | ||
| return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Key‑only drafts aren’t marked as created.
map.last_draft_created stays false even when a draft is produced, so common_ngram_map_accept will skip updates and n_accepted never adapts.
🔧 Proposed fix
- map.last_draft_created = false;
+ map.last_draft_created = true;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (map.key_only) { | |
| // simple mode: | |
| // Fill in the draft with the m tokens following the key. | |
| // We work with value values[0] only. | |
| int n_draft_tokens = std::min((int) m, (int) curr_key.values[0].n_accepted); | |
| for (int i = 0; i < n_draft_tokens; ++i) { | |
| draft.push_back(inp[match_pos + n + i]); | |
| } | |
| LOG_INF("%s: key_offset = %zu, key_num = %d, draft.size = %zu\n", __func__, | |
| key_offset, curr_key.key_num, draft.size()); | |
| map.last_draft_created = false; | |
| map.last_draft_key_idx = key_offset; | |
| map.last_draft_value_idx = 0; // value 0 is used for simple mode | |
| return; | |
| if (map.key_only) { | |
| // simple mode: | |
| // Fill in the draft with the m tokens following the key. | |
| // We work with value values[0] only. | |
| int n_draft_tokens = std::min((int) m, (int) curr_key.values[0].n_accepted); | |
| for (int i = 0; i < n_draft_tokens; ++i) { | |
| draft.push_back(inp[match_pos + n + i]); | |
| } | |
| LOG_INF("%s: key_offset = %zu, key_num = %d, draft.size = %zu\n", __func__, | |
| key_offset, curr_key.key_num, draft.size()); | |
| map.last_draft_created = true; | |
| map.last_draft_key_idx = key_offset; | |
| map.last_draft_value_idx = 0; // value 0 is used for simple mode | |
| return; |
🤖 Prompt for AI Agents
In `@common/ngram-map.cpp` around lines 195 - 211, When creating a simple key-only
draft in the block guarded by if (map.key_only) (where draft is filled and
map.last_draft_key_idx / map.last_draft_value_idx are set), also set
map.last_draft_created = true so that common_ngram_map_accept sees the draft and
updates n_accepted; update the assignment that currently sets last_draft_created
to false to true and keep the other last_draft_* assignments unchanged.
| static inline const ggml_tensor * apir_decode_ggml_tensor(apir_decoder * dec) { | ||
| const apir_rpc_tensor * apir_rpc_tensor = apir_decode_apir_rpc_tensor_inplace(dec); | ||
|
|
||
| if (!apir_rpc_tensor) { | ||
| return NULL; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find all call sites of apir_decode_ggml_tensor and show context
rg -n 'apir_decode_ggml_tensor\s*\(' -A 5 -B 2Repository: crc-org/llama.cpp
Length of output: 3114
🏁 Script executed:
#!/bin/bash
# Get the function names and their full context
rg -n 'apir_decode_ggml_tensor\s*\(' -B 15 backend-dispatched-buffer.cpp | head -100Repository: crc-org/llama.cpp
Length of output: 130
🏁 Script executed:
#!/bin/bash
# Get the function names and their full context
rg -n 'apir_decode_ggml_tensor\s*\(' -B 15 ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer.cpp | head -120Repository: crc-org/llama.cpp
Length of output: 1657
🏁 Script executed:
#!/bin/bash
# Get complete functions to see all tensor usages
sed -n '20,50p' ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer.cpp
sed -n '52,81p' ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer.cpp
sed -n '83,100p' ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer.cppRepository: crc-org/llama.cpp
Length of output: 2289
Add NULL checks for all apir_decode_ggml_tensor call sites before dereferencing.
The function now can return NULL (line 43-44 of apir_cs_ggml.h), but three call sites in backend-dispatched-buffer.cpp dereference the result without guards:
- backend_buffer_set_tensor (line 29): tensor used at line 47
- backend_buffer_get_tensor (line 61): tensor used at line 78
- backend_buffer_cpy_tensor (lines 91-92): src and dst both used at line 94
Add NULL checks after each apir_decode_ggml_tensor call to return an error code or log and return early, preventing NULL pointer dereferences.
🤖 Prompt for AI Agents
In `@ggml/src/ggml-virtgpu/backend/shared/apir_cs_ggml.h` around lines 40 - 45,
apir_decode_ggml_tensor can return NULL so update each call
site—backend_buffer_set_tensor, backend_buffer_get_tensor, and
backend_buffer_cpy_tensor—to check the returned const ggml_tensor* immediately
after the apir_decode_ggml_tensor(...) call and handle NULL by logging an error
and returning early with an appropriate error code (or return value) instead of
dereferencing; in backend_buffer_cpy_tensor check both src and dst results
separately and return on either NULL to avoid use-after-null.
| #include <iostream> | ||
| #include <mutex> | ||
|
|
||
| void ggml_virtgpu_cleanup(virtgpu *gpu); | ||
|
|
||
| static virtgpu * apir_initialize() { | ||
| static virtgpu * apir_gpu_instance = NULL; | ||
| static bool apir_initialized = false; | ||
| static virtgpu * gpu = NULL; | ||
| static std::atomic<bool> initialized = false; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, find the file and check its includes
fd 'ggml-backend-reg.cpp' --type fRepository: crc-org/llama.cpp
Length of output: 133
🏁 Script executed:
# Check the includes at the top of the file
head -20 ggml/src/ggml-virtgpu/ggml-backend-reg.cppRepository: crc-org/llama.cpp
Length of output: 503
🏁 Script executed:
# Search for atomic include in the file and related headers
rg "#include.*atomic" ggml/src/ggml-virtgpu/Repository: crc-org/llama.cpp
Length of output: 170
🏁 Script executed:
# Check what's in ggml-remoting.h to see if it includes atomic
fd 'ggml-remoting.h' --type fRepository: crc-org/llama.cpp
Length of output: 98
🏁 Script executed:
# Check if ggml-remoting.h includes atomic
head -30 ggml/src/ggml-virtgpu/ggml-remoting.hRepository: crc-org/llama.cpp
Length of output: 1094
🏁 Script executed:
# Check if ggml-virtgpu.h includes atomic
head -30 ggml/src/ggml-virtgpu/ggml-virtgpu.hRepository: crc-org/llama.cpp
Length of output: 156
🏁 Script executed:
# List all files in the virtgpu directory
ls -la ggml/src/ggml-virtgpu/Repository: crc-org/llama.cpp
Length of output: 1952
🏁 Script executed:
# Check if virtgpu.h includes atomic
fd 'virtgpu.h' --type f -x head -30 {}Repository: crc-org/llama.cpp
Length of output: 983
Add explicit #include <atomic> for std::atomic usage.
Line 11 uses std::atomic<bool>, but <atomic> is not explicitly included in this file. While the header is pulled transitively through virtgpu.h → virtgpu-utils.h, relying on transitive includes is fragile. Explicitly include <atomic> to prevent future breakage if the transitive dependency changes.
Suggested fix
`#include` "ggml-remoting.h"
`#include` "ggml-virtgpu.h"
`#include` <iostream>
`#include` <mutex>
+#include <atomic>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #include <iostream> | |
| #include <mutex> | |
| void ggml_virtgpu_cleanup(virtgpu *gpu); | |
| static virtgpu * apir_initialize() { | |
| static virtgpu * apir_gpu_instance = NULL; | |
| static bool apir_initialized = false; | |
| static virtgpu * gpu = NULL; | |
| static std::atomic<bool> initialized = false; | |
| `#include` <iostream> | |
| `#include` <mutex> | |
| `#include` <atomic> | |
| void ggml_virtgpu_cleanup(virtgpu *gpu); | |
| static virtgpu * apir_initialize() { | |
| static virtgpu * gpu = NULL; | |
| static std::atomic<bool> initialized = false; |
🤖 Prompt for AI Agents
In `@ggml/src/ggml-virtgpu/ggml-backend-reg.cpp` around lines 4 - 12, The file
uses std::atomic<bool> in the apir_initialize function but doesn't include
<atomic>; add an explicit `#include` <atomic> at the top of ggml-backend-reg.cpp
so the use of std::atomic in apir_initialize is valid without relying on
transitive includes, keeping the include near the other standard headers (e.g.,
alongside <mutex> and <iostream>).
|
🟢 Test of 'mac_ai test test_ci' succeeded after 00 hours 25 minutes 59 seconds. 🟢 • Link to the test results. • Link to the reports index. Test configuration: |
…165) * cuda : fix nkvo * cont : more robust cuda graph node property matching * cont : restore pre-leafs implementation * cont : comments + static_assert
* hexagon: updates to enable offloading to HTP on WoS * Update windows.md * Update windows.md * hexagon: enable -O3 optimizations * hexagon: move all _WINDOWS conditional compilation to _WIN32 * hexagon: updates to enable offloading to HTP on WoS * hexagon: use run-time vs load-time dynamic linking for cdsp driver interface * refactor htp-drv * hexagon: add run-bench.ps1 script * hexagon: htdrv refactor * hexagon: unify Android and Windows build readmes * hexagon: update README.md * hexagon: refactor htpdrv * hexagon: drv refactor * hexagon: more drv refactor * hexagon: fixes for android builds * hexagon: factor out dl into ggml-backend-dl * hexagon: add run-tool.ps1 script * hexagon: merge htp-utils in htp-drv and remove unused code * wos: no need for getopt_custom.h * wos: add missing CR in htpdrv * hexagon: ndev enforecement applies only to the Android devices * hexagon: add support for generating and signing .cat file * hexagon: add .inf file * hexagon: working auto-signing and improved windows builds * hexagon: futher improve skel build * hexagon: add rough WoS guide * hexagon: updated windows guide * hexagon: improve cmake handling of certs and logging * hexagon: improve windows setup/build doc * hexagon: more windows readme updates * hexagon: windows readme updates * hexagon: windows readme updates * hexagon: windows readme updates * hexagon: windows readme updates * Update windows.md * Update windows.md * snapdragon: rename docs/backend/hexagon to docs/backends/snapdragon Also added a power shell script to simplify build env setup. * hexagon: remove trailing whitespace and move cmake requirement to user-presets * hexagon: fix CMakeUserPresets path in workflow yaml * hexagon: introduce local version of libdl.h * hexagon: fix src1 reuse logic gpt-oss needs a bigger lookahead window. The check for src[1] itself being quantized was wrong. --------- Co-authored-by: Max Krasnyansky <[email protected]>
|
/test topsail kpouget-atuona remoting_linux |
…g (#19151) * webgpu : pipeline flash_attn Q/K loads in WGSL * ggml-webgpu: unroll Q*K accumlation inner loop * ggml-webgpu: vectorization * ggml-webgpu: unrolling * ggml-webgpu: remove redundant unrolling * ggml-webgpu: restore the config * ggml-webgpu: remove redundant comments * ggml-webgpu: formatting * ggml-webgpu: formatting and remove vectorization * ggml-webgpu: remove unnecessary constants * ggml-webgpu: change QKV buffer to read_write to pass validation * ggml-webgpu: add explanation for the additional bracket around Q K accumulate * Indentation and for -> if for tail * Kick off CI on wgsl only commits --------- Co-authored-by: Reese Levine <[email protected]>
|
🔴 Test of 'mac_ai test test_ci' failed after 02 hours 14 minutes 05 seconds. 🔴 • Link to the test results. • No reports index generated... Test configuration: Failure indicator: Empty. (See run.log) |
* Fix typos in SYCL documentation * Update SYCL.md * Update SYCL.md * Update SYCL.md * Update docs/backend/SYCL.md Co-authored-by: Neo Zhang Jianyu <[email protected]> * Update SYCL.md --------- Co-authored-by: Neo Zhang Jianyu <[email protected]>
* sycl: implement GGML_OP_TRI * docs: update ops.md for SYCL TRI * docs: regenerate ops.md * docs: update SYCL support for GGML_OP_TRI
* sycl: add softplus unary op implementation * sycl: add softplus unary op implementation * docs(ops): mark SYCL SOFTPLUS as supported * docs: update SYCL status for SOFTPLUS
This commit removes the unused tmp_buf variable from llama-kv-cache.cpp and llama-memory-recurrent.cpp. The tmp_buf variable was declared but never used but since it has a non-trivial constructor/desctuctor we don't get an unused variable warning about it.
This commit adds a missing return statement to the GraniteMoeModel class to fix an issue in the model conversion process. Resolves: ggml-org/llama.cpp#19201
not necessary
The static init isn't thread safe.
|
/test topsail mac-m4 remoting_mac |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ggml/src/ggml-virtgpu/ggml-backend-reg.cpp (1)
161-165:⚠️ Potential issue | 🟡 MinorNon-atomic
initializedflag may cause data race.The
static bool initializedat line 161 is accessed without synchronization. Ifggml_backend_virtgpu_reg()is called concurrently from multiple threads before initialization completes, this could result in a data race.Consider using
std::atomic<bool>or ensuring this function is only called from a single thread during startup.🔒 Proposed fix
static ggml_backend_reg reg = { /* .api_version = */ GGML_BACKEND_API_VERSION, /* .iface = */ ggml_backend_remoting_reg_i, /* .context = */ gpu, }; - static bool initialized = false; + static std::atomic<bool> initialized{false}; if (initialized) { return ® } - initialized = true; + initialized.store(true, std::memory_order_relaxed);
🤖 Fix all issues with AI agents
In `@docs/backend/snapdragon/README.md`:
- Around line 116-119: In the "Windows" section of README.md (the heading "###
Windows"), fix the typo by changing the word "scrits" to "scripts" in the
sentence referencing the Powershell folder so it reads "To run, adapt below
instructions to use Powershell scripts in `scripts/snapdragon/windows`."
- Around line 19-22: Update the README text under the "Windows On Snapdragon"
heading to correct the Visual Studio version typo: replace "MS Visual Studio
2026 (Community Edition or Pro)" with the correct current version(s), e.g., "MS
Visual Studio 2022 (Community Edition or Pro)" or "MS Visual Studio 2022/2024
(Community Edition or Pro)"; ensure only the display string is changed in the
small bullet list so the docs accurately reflect supported Visual Studio
releases.
In `@docs/backend/snapdragon/windows.md`:
- Line 147: Replace the misspelled word "certicate" with "certificate" in the
sentence that reads "The .cat file, the signature and proper certicate
installation can be verified with" in docs/backend/snapdragon/windows.md so it
reads "...proper certificate installation..."; ensure only the single typo is
corrected and surrounding punctuation/spacing is preserved.
In `@ggml/src/ggml-hexagon/htp-drv.cpp`:
- Around line 222-302: In get_driver_path(), check the return value of
LocalAlloc before using it: after LPQUERY_SERVICE_CONFIGW serviceConfig =
static_cast<...>(LocalAlloc(...)); verify serviceConfig != NULL, and if NULL
call GGML_LOG_ERROR with GetLastError (or fprintf),
CloseServiceHandle(schService) and CloseServiceHandle(schSCManager), then return
result to avoid passing NULL into QueryServiceConfigW; keep the existing cleanup
pattern used elsewhere in this function.
- Around line 63-66: The typedef remote_handle64_close_pfn_t currently declares
its parameter as remote_handle but the rest of the API and the corresponding
wrapper (the close wrapper that calls through remote_handle64_close_pfn_t) use
remote_handle64; update remote_handle64_close_pfn_t to take remote_handle64 as
its parameter type to match
remote_handle64_open_pfn_t/remote_handle64_invoke_pfn_t and the wrapper
function, ensuring ABI/size compatibility and preventing mismatched-call issues.
- Around line 103-109: In rpcmem_alloc2, detect when falling back to
rpcmem_alloc_pfn and validate that `size` fits into an `int` (compare against
INT_MAX); if it exceeds INT_MAX, log an error via GGML_LOG_ERROR (or
GGML_LOG_INFO if consistent) and return nullptr instead of truncating, otherwise
safely cast `size` to int and call `rpcmem_alloc_pfn(heapid, flags, (int)size)`.
Ensure you reference the existing symbols `rpcmem_alloc2`, `rpcmem_alloc2_pfn`,
and `rpcmem_alloc_pfn` and add no behavioral change when sizes are within
bounds.
- Around line 99-193: Wrap each wrapper to null-check its resolved function
pointer before calling (e.g., check rpcmem_free_pfn, rpcmem_to_fd_pfn,
fastrpc_mmap_pfn, fastrpc_munmap_pfn, dspqueue_create_pfn, dspqueue_close_pfn,
dspqueue_export_pfn, dspqueue_write_pfn, dspqueue_read_pfn,
remote_handle64_open_pfn, remote_handle64_invoke_pfn, remote_handle64_close_pfn,
remote_handle_control_pfn, remote_handle64_control_pfn,
remote_session_control_pfn); if the pfn is null log an error with GGML_LOG_ERROR
(or existing logger) and return a safe value: nullptr for pointer returns, -1
for int returns, and a failure AEEResult (e.g., AEE_EFAILED) for AEEResult
functions; for void functions simply log and return early. Ensure each wrapper
mirrors the pattern used in rpcmem_alloc2() for consistent defensive behavior.
In `@ggml/src/ggml-sycl/ggml-sycl.cpp`:
- Around line 2266-2322: The supports_op check for GGML_OP_TRI must be updated
to match the runtime assertions in ggml_sycl_op_tri: add checks for
ggml_is_contiguous(dst) and ggml_are_same_shape(src0, dst) alongside the
existing ggml_is_contiguous(src0) and type checks; locate the GGML_OP_TRI branch
(the function/method performing supported-op validation) and include these two
predicates so supports_op returns false when the destination isn't contiguous or
shapes differ, which aligns it with the tri_f32_sycl/ggml_sycl_op_tri
assumptions.
In `@scripts/snapdragon/windows/run-cli.ps1`:
- Around line 27-29: The code sets GGML_SCHED_DEBUG and then appends "-v" to
$cli_opts using string interpolation which converts the array to a string;
change the append to use array concatenation ($cli_opts += '-v') so $cli_opts
remains an array. Update the block that checks $env:SCHED (the lines that set
$env:GGML_SCHED_DEBUG and modify $cli_opts) to use += for $cli_opts and keep
setting $env:GGML_SCHED_DEBUG as-is, mirroring the fix applied in run-tool.ps1.
In `@scripts/snapdragon/windows/run-tool.ps1`:
- Around line 33-35: The CLI options array $cli_opts is being converted to a
string by using "$cli_opts -v"; change the append to use PowerShell array
concatenation ($cli_opts += '-v') so $cli_opts remains an array; update the
block that sets $env:GGML_SCHED_DEBUG when $env:SCHED is present to append '-v'
to $cli_opts instead of assigning a combined string.
In `@scripts/snapdragon/windows/setup-build.ps1`:
- Around line 102-103: The comment is wrong: it says "Set HEXAGON_SDK_ROOT"
while the code is setting the HEXAGON_TOOLS_ROOT environment variable; update
the comment to accurately reflect the action (e.g., change the comment to "Set
HEXAGON_TOOLS_ROOT") so it matches the SetEnvironmentVariable call that sets
'HEXAGON_TOOLS_ROOT' in the script (reference the
[System.Environment]::SetEnvironmentVariable call).
- Around line 51-69: The try/catch around the download/extract
(Invoke-WebRequest / tar) swallows the original failure after calling
Write-Error and cleanup (Remove-Item), allowing the script to continue into
environment setup; modify the catch in setup-build.ps1 so it either re-throws
the caught exception (using "throw") after cleanup or sets a failure flag (e.g.,
$DownloadFailed) and then use that flag to skip the subsequent environment
variable setup for that SDK; reference the existing catch block, Write-Error,
Remove-Item, $TempFile and $TargetDir to implement the chosen approach so failed
downloads do not proceed to env var configuration.
🧹 Nitpick comments (18)
scripts/snapdragon/windows/run-tool.ps1 (1)
15-18: Remove unused variable$remainingArgs.The variable
$remainingArgsis assigned on line 17 but never used. This is dead code.♻️ Proposed fix
if ($args.Count -gt 1) { $cli_opts=$args[1..($args.Count - 1)] - $remainingArgs = $args[1..($args.Count - 1)] }scripts/snapdragon/windows/run-cli.ps1 (1)
49-53: Consider documenting the hardcoded parameters.The script uses several hardcoded values (
--poll 1000,-t 6,--cpu-mask 0xfc,--ctx-size 8192, etc.) that may need adjustment for different hardware configurations. Consider adding a comment explaining these defaults or making them configurable via environment variables.ggml/src/ggml-cuda/ggml-cuda.cu (1)
3025-3030: Consider deduplicating extra sources for efficiency.The same source tensor can be added to
srcs_extramultiple times if it's used by multiple nodes, sinceseen_nodeonly tracks nodes, not sources. While this doesn't cause correctness issues (consistent ordering between runs), it results in redundant property tracking.♻️ Optional fix to deduplicate sources
std::unordered_set<ggml_tensor *> seen_node; std::vector<ggml_tensor *> srcs_extra; + std::unordered_set<ggml_tensor *> seen_extra; for (int i = 0; i < cgraph->n_nodes; i++) { bool props_match = true; seen_node.insert(cgraph->nodes[i]); if (!res) { props_match = ggml_cuda_graph_node_properties_match(cgraph->nodes[i], &graph->props[i]); } if (!props_match) { res = true; } ggml_cuda_graph_node_set_properties(&graph->props[i], cgraph->nodes[i]); for (int src_idx = 0; src_idx < GGML_MAX_SRC; ++src_idx) { ggml_tensor * src = cgraph->nodes[i]->src[src_idx]; - if (src && seen_node.find(src) == seen_node.end()) { + if (src && seen_node.find(src) == seen_node.end() && seen_extra.find(src) == seen_extra.end()) { srcs_extra.push_back(src); + seen_extra.insert(src); } } }ggml/src/ggml-webgpu/wgsl-shaders/flash_attn.wgsl (1)
163-169: Unused helper functions.
load_f32x4andload_kvx4are defined but never called in this shader. If these were intended for a different change, consider removing them to avoid dead code.🗑️ Remove unused functions
-fn load_f32x4(buf: ptr<storage, array<vec4<f32>>, read_write>, scalar_index: u32) -> vec4<f32> { - return (*buf)[scalar_index >> 2u]; -} - -fn load_kvx4(buf: ptr<storage, array<vec4<KV_TYPE>>, read_write>, scalar_index: u32) -> vec4<KV_TYPE> { - return (*buf)[scalar_index >> 2u]; -}ggml/src/ggml-sycl/ggml-sycl.cpp (1)
4686-4691: Align supports_op checks with ggml_sycl_op_tri assertions.
This avoids SYCL offload when dst is non-contiguous or shape-mismatched, which would otherwise assert at runtime.🔧 Suggested guard alignment
- { - const ggml_tensor * src0 = op->src[0]; - return src0 && - op->type == GGML_TYPE_F32 && - ggml_is_contiguous(src0); - } + { + const ggml_tensor * src0 = op->src[0]; + return src0 && + op->type == GGML_TYPE_F32 && + ggml_is_contiguous(src0) && + ggml_is_contiguous(op) && + ggml_are_same_shape(src0, op); + }ggml/src/ggml-hexagon/libggml-htp.inf (1)
36-38: Consider using double quotes in[Strings]section.INF files conventionally use double quotes for string values in the
[Strings]section. Single quotes may cause compatibility issues with some INF parsers.[Strings] -GGML = 'GGML' -DiskId = 'GGML HTP library' +GGML = "GGML" +DiskId = "GGML HTP library"ggml/src/ggml-hexagon/htp-drv.h (2)
64-66: Use angle brackets for system headers.Standard system headers should use angle brackets rather than quotes for proper include path resolution.
`#if` !defined(__hexagon__) && !defined(_WIN32) -# include "unistd.h" +# include <unistd.h> # include <sys/time.h> `#endif`
47-48: Potential integer overflow in SLEEP macro.On Windows,
Sleep(1000 * x)could overflow ifxis large, since1000 * xis computed asintmultiplication before being passed toSleep(). Consider usingSleep((DWORD)(x) * 1000)or casting appropriately.# ifdef _WIN32 -# define SLEEP(x) Sleep(1000 * x) /* Sleep accepts input in milliseconds. */ +# define SLEEP(x) Sleep((DWORD)(x) * 1000) /* Sleep accepts input in milliseconds. */ # elseggml/src/ggml-virtgpu/virtgpu.cpp (1)
152-157: Unreachable code afterGGML_ABORT.
GGML_ABORTtypically terminates execution and does not return. Thereturn NULL;on line 156 is unreachable. This pattern appears in other places in this file (lines 160-161, 165-166, etc.) and seems to be a defensive style, but it may confuse static analyzers or readers.If
GGML_ABORTis intended to possibly return in some configurations, consider documenting this. Otherwise, the pattern is harmless but technically dead code.ggml/src/ggml-backend-reg.cpp (1)
14-26: Consider removing redundant platform headers.With the dynamic loading logic now in
ggml-backend-dl.cpp, the#include <windows.h>and#include <dlfcn.h>blocks (lines 14-26) may no longer be needed in this file. However, verify thatget_executable_path()(lines 375-436) still requires these headers before removing them.ggml/src/ggml-virtgpu/virtgpu-forward-buffer.cpp (1)
90-119: Consider RAII for mutex management.The mutex lock/unlock pattern is correct, but if
REMOTE_CALLormemcpywere to fail unexpectedly (e.g., viaGGML_ABORTin a nested call), the mutex could remain locked. Consider using a simple RAII wrapper or C++std::lock_guardequivalent for the C11 mutex to ensure cleanup on all exit paths.This is a minor concern since the current code paths appear to handle errors via
GGML_ABORTwhich terminates the process anyway.ggml/src/ggml-backend-dl.cpp (1)
28-30: Windowsdl_error()loses error information.Returning an empty string discards useful diagnostic information. Consider using
GetLastError()andFormatMessage()to provide meaningful error messages, similar to the POSIXdlerror()behavior.💡 Suggested improvement
const char * dl_error() { - return ""; + static thread_local char buf[256]; + DWORD err = GetLastError(); + if (err == 0) { + return ""; + } + FormatMessageA(FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS, + NULL, err, MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT), + buf, sizeof(buf), NULL); + return buf; +}ggml/src/ggml-virtgpu/ggml-backend-reg.cpp (1)
98-126: Redundant initialization guard.The
static std::atomic<bool> initializedat line 98 duplicates protection that's already provided by theif (devices.size() > 0)check at line 87. Consider removing one of these guards for clarity.Additionally, the atomic should use explicit memory ordering as noted in past reviews.
ggml/src/ggml-hexagon/ggml-hexagon.cpp (1)
3147-3148: Redundant assignment ofopt_hostbuf.This line reassigns
opt_hostbufusing the same expression already evaluated on line 3128. If the intent is to ensure a default of 1 when the environment variable is not set, line 3128 already handles this correctly.🔧 Suggested fix
- opt_hostbuf = str_hostbuf ? atoi(str_hostbuf) : 1; - reg->context = new ggml_hexagon_registry(reg);ggml/src/ggml-backend-dl.h (1)
9-9: Potentially unnecessary<winevt.h>include.The
<winevt.h>header provides Windows Event Log APIs, which appear unrelated to dynamic library loading functionality. This may have been included by mistake.🔧 Suggested fix
# include <windows.h> -# include <winevt.h> `#else`ggml/src/ggml-hexagon/libdl.h (2)
9-9: Potentially unnecessary<winevt.h>include.Same as in
ggml-backend-dl.h, this header appears unrelated to dynamic library loading.🔧 Suggested fix
# include <windows.h> -# include <winevt.h> `#else`
1-77: Consider consolidating withggml-backend-dl.cpp.This file provides inline implementations that are nearly identical to those in
ggml/src/ggml-backend-dl.cpp(as shown in the relevant code snippets). While having a header-only version for the Hexagon backend may be intentional for build isolation, the duplication could lead to maintenance drift.Consider whether these could share a common implementation or if the duplication is acceptable for build system reasons.
ggml/src/ggml-hexagon/htp-drv.cpp (1)
308-360: Makehtpdrv_initthread-safe.
initializedis unsynchronized; concurrent callers can race and partially initialize pfns. Consider a mutex/once guard.🛠️ One option using a mutex
-#include <string> +#include <string> +#include <mutex>int htpdrv_init() { static dl_handle_ptr lib_cdsp_rpc_handle = nullptr; static bool initialized = false; + static std::mutex init_mutex; + std::lock_guard<std::mutex> lock(init_mutex); `#ifdef` _WIN32
| ### Windows On Snapdragon | ||
|
|
||
| Native Windows 11 arm64 builds has the following tools dependencies: | ||
| - MS Visual Studio 2026 (Community Edition or Pro) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix Visual Studio version typo.
"MS Visual Studio 2026" appears to be a typo. Current versions are 2022 or 2024.
📝 Proposed fix
-- MS Visual Studio 2026 (Community Edition or Pro)
+- MS Visual Studio 2022 (Community Edition or Pro)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ### Windows On Snapdragon | |
| Native Windows 11 arm64 builds has the following tools dependencies: | |
| - MS Visual Studio 2026 (Community Edition or Pro) | |
| ### Windows On Snapdragon | |
| Native Windows 11 arm64 builds has the following tools dependencies: | |
| - MS Visual Studio 2022 (Community Edition or Pro) |
🤖 Prompt for AI Agents
In `@docs/backend/snapdragon/README.md` around lines 19 - 22, Update the README
text under the "Windows On Snapdragon" heading to correct the Visual Studio
version typo: replace "MS Visual Studio 2026 (Community Edition or Pro)" with
the correct current version(s), e.g., "MS Visual Studio 2022 (Community Edition
or Pro)" or "MS Visual Studio 2022/2024 (Community Edition or Pro)"; ensure only
the display string is changed in the small bullet list so the docs accurately
reflect supported Visual Studio releases.
| ### Windows | ||
|
|
||
| All artifacts are already installed in the `pkg-snapdragon` folder. | ||
| To run, adapt below instructions to use Powershell scrits in `scripts/snapdragon/windows`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo: "scrits" → "scripts".
📝 Proposed fix
-To run, adapt below instructions to use Powershell scrits in `scripts/snapdragon/windows`.
+To run, adapt below instructions to use PowerShell scripts in `scripts/snapdragon/windows`.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ### Windows | |
| All artifacts are already installed in the `pkg-snapdragon` folder. | |
| To run, adapt below instructions to use Powershell scrits in `scripts/snapdragon/windows`. | |
| ### Windows | |
| All artifacts are already installed in the `pkg-snapdragon` folder. | |
| To run, adapt below instructions to use PowerShell scripts in `scripts/snapdragon/windows`. |
🤖 Prompt for AI Agents
In `@docs/backend/snapdragon/README.md` around lines 116 - 119, In the "Windows"
section of README.md (the heading "### Windows"), fix the typo by changing the
word "scrits" to "scripts" in the sentence referencing the Powershell folder so
it reads "To run, adapt below instructions to use Powershell scripts in
`scripts/snapdragon/windows`."
| -a---- 1/22/2026 6:01 PM 4139 libggml-htp.cat | ||
| ``` | ||
|
|
||
| The .cat file, the signature and proper certicate installation can be verified with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo: "certicate" → "certificate".
📝 Proposed fix
-The .cat file, the signature and proper certicate installation can be verified with
+The .cat file, the signature and proper certificate installation can be verified with📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| The .cat file, the signature and proper certicate installation can be verified with | |
| The .cat file, the signature and proper certificate installation can be verified with |
🧰 Tools
🪛 LanguageTool
[grammar] ~147-~147: Ensure spelling is correct
Context: ...The .cat file, the signature and proper certicate installation can be verified with ``` ...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🤖 Prompt for AI Agents
In `@docs/backend/snapdragon/windows.md` at line 147, Replace the misspelled word
"certicate" with "certificate" in the sentence that reads "The .cat file, the
signature and proper certicate installation can be verified with" in
docs/backend/snapdragon/windows.md so it reads "...proper certificate
installation..."; ensure only the single typo is corrected and surrounding
punctuation/spacing is preserved.
| typedef int (*remote_handle64_open_pfn_t)(const char* name, remote_handle64 *ph); | ||
| typedef int (*remote_handle64_invoke_pfn_t)(remote_handle64 h, uint32_t dwScalars, remote_arg *pra); | ||
| typedef int (*remote_handle64_close_pfn_t)(remote_handle h); | ||
| typedef int (*remote_handle_control_pfn_t)(uint32_t req, void* data, uint32_t datalen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, locate the file and verify its content at the specified lines
fd "htp-drv.cpp" -t fRepository: crc-org/llama.cpp
Length of output: 94
🏁 Script executed:
# Read the specific lines mentioned in the review (63-66)
sed -n '60,70p' ggml/src/ggml-hexagon/htp-drv.cppRepository: crc-org/llama.cpp
Length of output: 803
🏁 Script executed:
# Search for remote_handle typedef definitions in headers
rg -n "typedef.*remote_handle" --type h --type hpp -A 1 -B 1Repository: crc-org/llama.cpp
Length of output: 88
🏁 Script executed:
# Also search in any included files for the definitions
rg -n "typedef.*remote_handle|struct.*remote_handle" -g '*.{h,hpp}' -A 1Repository: crc-org/llama.cpp
Length of output: 43
🏁 Script executed:
# Search for remote_handle type definitions in the entire codebase
rg -n "typedef|struct|using" | rg "remote_handle64|remote_handle\b" | head -20Repository: crc-org/llama.cpp
Length of output: 561
🏁 Script executed:
# Look for any header files that might define these types
fd -t f -e h -e hpp -e hxx | head -20Repository: crc-org/llama.cpp
Length of output: 469
🏁 Script executed:
# Search in the Hexagon-specific directory for type definitions
rg -n "remote_handle" ggml/src/ggml-hexagon/ -A 2 -B 2Repository: crc-org/llama.cpp
Length of output: 9504
🏁 Script executed:
# Check if remote_handle and remote_handle64 are documented or defined anywhere in comments/code
rg -n "remote_handle64\|remote_handle" --type c --type cpp -C 3 | head -50Repository: crc-org/llama.cpp
Length of output: 43
🏁 Script executed:
# Search for where remote_handle and remote_handle64 are defined or used in the codebase
rg -n "remote_handle64.*h\)" ggml/src/ggml-hexagon/ | grep -E "(remote_handle64_close|remote_handle64_invoke|remote_handle64_open|remote_handle64_control)" | head -20Repository: crc-org/llama.cpp
Length of output: 654
🏁 Script executed:
# Check the dlsym loading at line 354 to see what's being loaded
sed -n '349,355p' ggml/src/ggml-hexagon/htp-drv.cppRepository: crc-org/llama.cpp
Length of output: 736
🏁 Script executed:
# Look for other typedefs to compare the pattern
rg -n "typedef int.*remote_handle" ggml/src/ggml-hexagon/htp-drv.cppRepository: crc-org/llama.cpp
Length of output: 516
Fix typedef parameter type on line 65 to use remote_handle64.
The typedef remote_handle64_close_pfn_t on line 65 uses remote_handle as its parameter type, while the wrapper function at line 179 and all other similar typedefs (remote_handle64_open_pfn_t, remote_handle64_invoke_pfn_t, remote_handle64_control_pfn_t) consistently use remote_handle64. This creates an ABI mismatch where a remote_handle64 value is passed to a function pointer expecting a different type, potentially causing memory corruption if the types differ in size.
🛠️ Proposed fix
-typedef int (*remote_handle64_close_pfn_t)(remote_handle h);
+typedef int (*remote_handle64_close_pfn_t)(remote_handle64 h);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| typedef int (*remote_handle64_open_pfn_t)(const char* name, remote_handle64 *ph); | |
| typedef int (*remote_handle64_invoke_pfn_t)(remote_handle64 h, uint32_t dwScalars, remote_arg *pra); | |
| typedef int (*remote_handle64_close_pfn_t)(remote_handle h); | |
| typedef int (*remote_handle_control_pfn_t)(uint32_t req, void* data, uint32_t datalen); | |
| typedef int (*remote_handle64_open_pfn_t)(const char* name, remote_handle64 *ph); | |
| typedef int (*remote_handle64_invoke_pfn_t)(remote_handle64 h, uint32_t dwScalars, remote_arg *pra); | |
| typedef int (*remote_handle64_close_pfn_t)(remote_handle64 h); | |
| typedef int (*remote_handle_control_pfn_t)(uint32_t req, void* data, uint32_t datalen); |
🤖 Prompt for AI Agents
In `@ggml/src/ggml-hexagon/htp-drv.cpp` around lines 63 - 66, The typedef
remote_handle64_close_pfn_t currently declares its parameter as remote_handle
but the rest of the API and the corresponding wrapper (the close wrapper that
calls through remote_handle64_close_pfn_t) use remote_handle64; update
remote_handle64_close_pfn_t to take remote_handle64 as its parameter type to
match remote_handle64_open_pfn_t/remote_handle64_invoke_pfn_t and the wrapper
function, ensuring ABI/size compatibility and preventing mismatched-call issues.
| void * rpcmem_alloc(int heapid, uint32_t flags, int size) { | ||
| return rpcmem_alloc_pfn(heapid, flags, size); | ||
| } | ||
|
|
||
| void * rpcmem_alloc2(int heapid, uint32_t flags, size_t size) { | ||
| if (rpcmem_alloc2_pfn) { | ||
| return rpcmem_alloc2_pfn(heapid, flags, size); | ||
| } else { | ||
| GGML_LOG_INFO("ggml-hex: rpcmem_alloc2 not found, falling back to rpcmem_alloc\n"); | ||
| return rpcmem_alloc_pfn(heapid, flags, size); | ||
| } | ||
| } | ||
|
|
||
| void rpcmem_free(void * po) { | ||
| return rpcmem_free_pfn(po); | ||
| } | ||
|
|
||
| int rpcmem_to_fd(void * po) { | ||
| return rpcmem_to_fd_pfn(po); | ||
| } | ||
|
|
||
| HTPDRV_API int fastrpc_mmap(int domain, int fd, void * addr, int offset, size_t length, enum fastrpc_map_flags flags) { | ||
| return fastrpc_mmap_pfn(domain, fd, addr, offset, length, flags); | ||
| } | ||
|
|
||
| HTPDRV_API int fastrpc_munmap(int domain, int fd, void * addr, size_t length) { | ||
| return fastrpc_munmap_pfn(domain, fd, addr, length); | ||
| } | ||
|
|
||
| AEEResult dspqueue_create(int domain, | ||
| uint32_t flags, | ||
| uint32_t req_queue_size, | ||
| uint32_t resp_queue_size, | ||
| dspqueue_callback_t packet_callback, | ||
| dspqueue_callback_t error_callback, | ||
| void * callback_context, | ||
| dspqueue_t * queue) { | ||
| return dspqueue_create_pfn(domain, flags, req_queue_size, resp_queue_size, packet_callback, error_callback, | ||
| callback_context, queue); | ||
| } | ||
|
|
||
| AEEResult dspqueue_close(dspqueue_t queue) { | ||
| return dspqueue_close_pfn(queue); | ||
| } | ||
|
|
||
| AEEResult dspqueue_export(dspqueue_t queue, uint64_t * queue_id) { | ||
| return dspqueue_export_pfn(queue, queue_id); | ||
| } | ||
|
|
||
| AEEResult dspqueue_write(dspqueue_t queue, | ||
| uint32_t flags, | ||
| uint32_t num_buffers, | ||
| struct dspqueue_buffer * buffers, | ||
| uint32_t message_length, | ||
| const uint8_t * message, | ||
| uint32_t timeout_us) { | ||
| return dspqueue_write_pfn(queue, flags, num_buffers, buffers, message_length, message, timeout_us); | ||
| } | ||
|
|
||
| AEEResult dspqueue_read(dspqueue_t queue, | ||
| uint32_t * flags, | ||
| uint32_t max_buffers, | ||
| uint32_t * num_buffers, | ||
| struct dspqueue_buffer * buffers, | ||
| uint32_t max_message_length, | ||
| uint32_t * message_length, | ||
| uint8_t * message, | ||
| uint32_t timeout_us) { | ||
| return dspqueue_read_pfn(queue, flags, max_buffers, num_buffers, buffers, max_message_length, message_length, | ||
| message, timeout_us); | ||
| } | ||
|
|
||
| HTPDRV_API int remote_handle64_open(const char * name, remote_handle64 * ph) { | ||
| return remote_handle64_open_pfn(name, ph); | ||
| } | ||
|
|
||
| HTPDRV_API int remote_handle64_invoke(remote_handle64 h, uint32_t dwScalars, remote_arg * pra) { | ||
| return remote_handle64_invoke_pfn(h, dwScalars, pra); | ||
| } | ||
|
|
||
| HTPDRV_API int remote_handle64_close(remote_handle64 h) { | ||
| return remote_handle64_close_pfn(h); | ||
| } | ||
|
|
||
| HTPDRV_API int remote_handle_control(uint32_t req, void * data, uint32_t datalen) { | ||
| return remote_handle_control_pfn(req, data, datalen); | ||
| } | ||
|
|
||
| HTPDRV_API int remote_handle64_control(remote_handle64 h, uint32_t req, void * data, uint32_t datalen) { | ||
| return remote_handle64_control_pfn(h, req, data, datalen); | ||
| } | ||
|
|
||
| HTPDRV_API int remote_session_control(uint32_t req, void * data, uint32_t datalen) { | ||
| return remote_session_control_pfn(req, data, datalen); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's look at the actual file content around lines 99-193
cat -n ggml/src/ggml-hexagon/htp-drv.cpp | sed -n '90,200p'Repository: crc-org/llama.cpp
Length of output: 5125
🏁 Script executed:
# Search for htpdrv_init function definition
rg -n "htpdrv_init\|rpcmem_alloc_pfn" ggml/src/ggml-hexagon/htp-drv.cpp -A 5 | head -100Repository: crc-org/llama.cpp
Length of output: 43
🏁 Script executed:
# Check if pfn variables are initialized at module level
rg -n "rpcmem_alloc_pfn\s*=|rpcmem_free_pfn\s*=|fastrpc_mmap_pfn\s*=" ggml/src/ggml-hexagon/htp-drv.cpp -B 2Repository: crc-org/llama.cpp
Length of output: 336
🏁 Script executed:
# Look for any guards or checks before calling pfn functions
rg -n "if.*_pfn\)|if.*_pfn\s*{" ggml/src/ggml-hexagon/htp-drv.cpp | head -20Repository: crc-org/llama.cpp
Length of output: 170
🏁 Script executed:
# Search for htpdrv_init implementation
rg -n "htpdrv_init" ggml/src/ggml-hexagon/htp-drv.cpp -A 20 | head -100Repository: crc-org/llama.cpp
Length of output: 837
🏁 Script executed:
# Check for any initialization patterns or startup code
rg -n "rpcmem_alloc_pfn\s*=" ggml/src/ggml-hexagon/htp-drv.cpp -B 3 -A 3Repository: crc-org/llama.cpp
Length of output: 300
🏁 Script executed:
# Look for how this module is used - check header files
fd "\.h$" ggml/src/ggml-hexagon/ | head -5Repository: crc-org/llama.cpp
Length of output: 247
🏁 Script executed:
# Get more of htpdrv_init to see where pfns are set
rg -n "htpdrv_init" ggml/src/ggml-hexagon/htp-drv.cpp -A 100 | grep -E "(rpcmem_alloc_pfn\s*=|rpcmem_free_pfn\s*=|initialized\s*=|return)" | head -30Repository: crc-org/llama.cpp
Length of output: 566
🏁 Script executed:
# Check if there are any guards at call sites of these wrapper functions
rg -n "rpcmem_alloc\(|rpcmem_free\(|fastrpc_mmap\(" ggml/src/ggml-hexagon/ -B 2 | head -50Repository: crc-org/llama.cpp
Length of output: 1435
🏁 Script executed:
# Check header file to understand API contracts
fd "htp-drv\.h" ggml/src/ggml-hexagon/ && cat ggml/src/ggml-hexagon/htp-drv.h | head -100Repository: crc-org/llama.cpp
Length of output: 2545
🏁 Script executed:
# Check how htpdrv_init is called to understand the initialization order
rg -n "htpdrv_init\(\)" ggml/src/ggml-hexagon/ -B 3 -A 3Repository: crc-org/llama.cpp
Length of output: 1076
🏁 Script executed:
# Look for the complete htpdrv_init implementation to see all pfn assignments
sed -n '308,400p' ggml/src/ggml-hexagon/htp-drv.cppRepository: crc-org/llama.cpp
Length of output: 4259
🏁 Script executed:
# Check if there's any documentation or contracts in comments about initialization order
rg -n "htpdrv_init\|must be called\|before calling" ggml/src/ggml-hexagon/htp-drv.h -B 2 -A 5Repository: crc-org/llama.cpp
Length of output: 43
🏁 Script executed:
# Let me verify the complete dlsym setup and check if there's any defensive initialization pattern used elsewhere
sed -n '330,360p' ggml/src/ggml-hexagon/htp-drv.cppRepository: crc-org/llama.cpp
Length of output: 2287
🏁 Script executed:
# Check if there are any comments in the header about initialization requirements
sed -n '95,135p' ggml/src/ggml-hexagon/htp-drv.hRepository: crc-org/llama.cpp
Length of output: 748
🏁 Script executed:
# Verify the coding style of the wrapper functions (indentation, pointer alignment, etc.)
sed -n '99,115p' ggml/src/ggml-hexagon/htp-drv.cpp | cat -ARepository: crc-org/llama.cpp
Length of output: 571
🏁 Script executed:
# Check if there are any other callers of wrapper functions that might bypass htpdrv_init
rg -n "rpcmem_alloc\(|rpcmem_free\(|fastrpc_mmap\(" ggml/src/ggml-hexagon/ggml-hexagon.cpp -B 5 | head -40Repository: crc-org/llama.cpp
Length of output: 928
🏁 Script executed:
# Verify if there are any static initializers or global initialization that might use these before htpdrv_init
rg -n "rpcmem_alloc\(|rpcmem_free\(|fastrpc_mmap\(" ggml/src/ggml-hexagon/ --type cpp | grep -v "ggml-hexagon.cpp" | head -10Repository: crc-org/llama.cpp
Length of output: 383
Add null-pointer guards to wrapper functions.
These wrapper functions at lines 99-193 call function pointers unconditionally without checking if they've been resolved. While htpdrv_init() is the intended initialization entry point, the wrappers will crash if called before initialization succeeds or symbol resolution fails. The codebase already demonstrates this pattern—rpcmem_alloc2() (line 104) and get_hex_arch_ver() (line 380) include defensive null-checks.
Add guards similar to rpcmem_alloc2(): check the pfn pointer before calling, log an error, and return a safe value (nullptr or error code depending on return type).
Example guard
void * rpcmem_alloc(int heapid, uint32_t flags, int size) {
- return rpcmem_alloc_pfn(heapid, flags, size);
+ if (!rpcmem_alloc_pfn) {
+ GGML_LOG_ERROR("ggml-hex: rpcmem_alloc called before htpdrv_init\n");
+ return nullptr;
+ }
+ return rpcmem_alloc_pfn(heapid, flags, size);
}Apply similarly to rpcmem_free(), rpcmem_to_fd(), fastrpc_mmap(), fastrpc_munmap(), dspqueue_*(), and remote_*() functions.
🤖 Prompt for AI Agents
In `@ggml/src/ggml-hexagon/htp-drv.cpp` around lines 99 - 193, Wrap each wrapper
to null-check its resolved function pointer before calling (e.g., check
rpcmem_free_pfn, rpcmem_to_fd_pfn, fastrpc_mmap_pfn, fastrpc_munmap_pfn,
dspqueue_create_pfn, dspqueue_close_pfn, dspqueue_export_pfn,
dspqueue_write_pfn, dspqueue_read_pfn, remote_handle64_open_pfn,
remote_handle64_invoke_pfn, remote_handle64_close_pfn,
remote_handle_control_pfn, remote_handle64_control_pfn,
remote_session_control_pfn); if the pfn is null log an error with GGML_LOG_ERROR
(or existing logger) and return a safe value: nullptr for pointer returns, -1
for int returns, and a failure AEEResult (e.g., AEE_EFAILED) for AEEResult
functions; for void functions simply log and return early. Ensure each wrapper
mirrors the pattern used in rpcmem_alloc2() for consistent defensive behavior.
| static void tri_f32_sycl( | ||
| const float * src, | ||
| float * dst, | ||
| const int64_t ne0, | ||
| const int64_t ne1, | ||
| const int64_t ne2, | ||
| const int64_t ne3, | ||
| const ggml_tri_type ttype, | ||
| dpct::queue_ptr main_stream | ||
| ) { | ||
| const size_t total = (size_t) ne0 * (size_t) ne1 * (size_t) ne2 * (size_t) ne3; | ||
|
|
||
| main_stream->parallel_for(sycl::range<1>(total), [=](sycl::id<1> tid) { | ||
| const int64_t idx = (int64_t) tid[0]; | ||
|
|
||
| const int64_t i0 = idx % ne0; | ||
| const int64_t t1 = idx / ne0; | ||
| const int64_t i1 = t1 % ne1; | ||
|
|
||
| bool keep = false; | ||
| switch (ttype) { | ||
| case GGML_TRI_TYPE_LOWER: keep = (i0 < i1); break; | ||
| case GGML_TRI_TYPE_LOWER_DIAG: keep = (i0 <= i1); break; | ||
| case GGML_TRI_TYPE_UPPER: keep = (i0 > i1); break; | ||
| case GGML_TRI_TYPE_UPPER_DIAG: keep = (i0 >= i1); break; | ||
| default: keep = false; break; | ||
| } | ||
|
|
||
| dst[idx] = keep ? src[idx] : 0.0f; | ||
| }); | ||
| } | ||
|
|
||
| static void ggml_sycl_op_tri(ggml_backend_sycl_context & ctx, ggml_tensor * dst) { | ||
| const ggml_tensor * src0 = dst->src[0]; | ||
| GGML_ASSERT(src0); | ||
|
|
||
| GGML_ASSERT(src0->type == GGML_TYPE_F32); | ||
| GGML_ASSERT(dst->type == GGML_TYPE_F32); | ||
| GGML_ASSERT(ggml_is_contiguous(src0)); | ||
| GGML_ASSERT(ggml_is_contiguous(dst)); | ||
| GGML_ASSERT(ggml_are_same_shape(src0, dst)); | ||
|
|
||
| dpct::queue_ptr main_stream = ctx.stream(); | ||
| SYCL_CHECK(ggml_sycl_set_device(ctx.device)); | ||
|
|
||
| const float * src0_dd = static_cast<const float *>(src0->data); | ||
| float * dst_dd = static_cast<float *>(dst->data); | ||
|
|
||
| const ggml_tri_type ttype = (ggml_tri_type) ggml_get_op_params_i32(dst, 0); | ||
|
|
||
| const int64_t ne0 = src0->ne[0]; | ||
| const int64_t ne1 = src0->ne[1]; | ||
| const int64_t ne2 = src0->ne[2]; | ||
| const int64_t ne3 = src0->ne[3]; | ||
|
|
||
| tri_f32_sycl(src0_dd, dst_dd, ne0, ne1, ne2, ne3, ttype, main_stream); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Search for TRI implementations in CPU, CUDA and other backends
rg -A 15 "case GGML_OP_TRI:" ggml/src --type cpp --type c -B 2Repository: crc-org/llama.cpp
Length of output: 13585
🏁 Script executed:
# Look for tri_f32 implementations in different files
fd . ggml/src -type f \( -name "*.cpp" -o -name "*.c" \) -exec grep -l "tri_f32" {} \;Repository: crc-org/llama.cpp
Length of output: 231
🏁 Script executed:
# Find CPU TRI implementation
rg -B 5 -A 20 "void.*tri.*f32.*const float" ggml/src/ggml.c --type cRepository: crc-org/llama.cpp
Length of output: 43
🏁 Script executed:
# Find CPU TRI implementation
rg -B 2 -A 25 "ggml_compute_forward_tri" ggml/src/ggml-cpu/ggml-cpu.cRepository: crc-org/llama.cpp
Length of output: 1029
🏁 Script executed:
# Also search in main ggml.c file for tri definitions
rg -B 5 -A 25 "compute_forward_tri" ggml/src/ggml.cRepository: crc-org/llama.cpp
Length of output: 43
🏁 Script executed:
# Search for the actual tri function definition
rg -B 2 -A 30 "static inline void ggml_compute_forward_tri|void ggml_compute_forward_tri" ggml/srcRepository: crc-org/llama.cpp
Length of output: 7720
🏁 Script executed:
# Search for tri mask logic patterns
rg -B 5 -A 15 "GGML_TRI_TYPE_LOWER|GGML_TRI_TYPE_UPPER" ggml/src --type cRepository: crc-org/llama.cpp
Length of output: 43
🏁 Script executed:
# Verify the support check for TRI in SYCL
rg -B 5 -A 8 "case GGML_OP_TRI:" ggml/src/ggml-sycl/ggml-sycl.cpp | grep -A 8 "supports_op"Repository: crc-org/llama.cpp
Length of output: 43
🏁 Script executed:
# Check runtime assertions in tri_f32_sycl against support check
sed -n '2266,2322p' ggml/src/ggml-sycl/ggml-sycl.cppRepository: crc-org/llama.cpp
Length of output: 1956
🏁 Script executed:
# View support check implementation
sed -n '4686,4691p' ggml/src/ggml-sycl/ggml-sycl.cppRepository: crc-org/llama.cpp
Length of output: 275
TRI mask orientation is correct and matches CPU backend.
The keep conditions properly implement the triangular mask logic with i0 as the column index and i1 as the row index. The LOWER/UPPER distinctions match the CPU implementation exactly.
However, align the supports_op check for GGML_OP_TRI with the runtime assertions in ggml_sycl_op_tri. Add checks for:
ggml_is_contiguous(dst)ggml_are_same_shape(src0, dst)
The current support check at lines 4686-4691 only validates source contiguity and type, but the operation asserts that destination is also contiguous and shapes match.
🤖 Prompt for AI Agents
In `@ggml/src/ggml-sycl/ggml-sycl.cpp` around lines 2266 - 2322, The supports_op
check for GGML_OP_TRI must be updated to match the runtime assertions in
ggml_sycl_op_tri: add checks for ggml_is_contiguous(dst) and
ggml_are_same_shape(src0, dst) alongside the existing ggml_is_contiguous(src0)
and type checks; locate the GGML_OP_TRI branch (the function/method performing
supported-op validation) and include these two predicates so supports_op returns
false when the destination isn't contiguous or shapes differ, which aligns it
with the tri_f32_sycl/ggml_sycl_op_tri assumptions.
| if ($null -ne $env:SCHED) { | ||
| $env:GGML_SCHED_DEBUG=$env:SCHED; $cli_opts="$cli_opts -v" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix array concatenation for CLI options (same issue as run-tool.ps1).
Using $cli_opts="$cli_opts -v" converts the array to a string. Use += to append to the array properly.
🐛 Proposed fix
if ($null -ne $env:SCHED) {
- $env:GGML_SCHED_DEBUG=$env:SCHED; $cli_opts="$cli_opts -v"
+ $env:GGML_SCHED_DEBUG=$env:SCHED; $cli_opts += "-v"
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if ($null -ne $env:SCHED) { | |
| $env:GGML_SCHED_DEBUG=$env:SCHED; $cli_opts="$cli_opts -v" | |
| } | |
| if ($null -ne $env:SCHED) { | |
| $env:GGML_SCHED_DEBUG=$env:SCHED; $cli_opts += "-v" | |
| } |
🤖 Prompt for AI Agents
In `@scripts/snapdragon/windows/run-cli.ps1` around lines 27 - 29, The code sets
GGML_SCHED_DEBUG and then appends "-v" to $cli_opts using string interpolation
which converts the array to a string; change the append to use array
concatenation ($cli_opts += '-v') so $cli_opts remains an array. Update the
block that checks $env:SCHED (the lines that set $env:GGML_SCHED_DEBUG and
modify $cli_opts) to use += for $cli_opts and keep setting $env:GGML_SCHED_DEBUG
as-is, mirroring the fix applied in run-tool.ps1.
| if ($null -ne $env:SCHED) { | ||
| $env:GGML_SCHED_DEBUG=$env:SCHED; $cli_opts="$cli_opts -v" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix array concatenation for CLI options.
Using $cli_opts="$cli_opts -v" converts the array to a string. Use += to append to the array properly.
🐛 Proposed fix
if ($null -ne $env:SCHED) {
- $env:GGML_SCHED_DEBUG=$env:SCHED; $cli_opts="$cli_opts -v"
+ $env:GGML_SCHED_DEBUG=$env:SCHED; $cli_opts += "-v"
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if ($null -ne $env:SCHED) { | |
| $env:GGML_SCHED_DEBUG=$env:SCHED; $cli_opts="$cli_opts -v" | |
| } | |
| if ($null -ne $env:SCHED) { | |
| $env:GGML_SCHED_DEBUG=$env:SCHED; $cli_opts += "-v" | |
| } |
🤖 Prompt for AI Agents
In `@scripts/snapdragon/windows/run-tool.ps1` around lines 33 - 35, The CLI
options array $cli_opts is being converted to a string by using "$cli_opts -v";
change the append to use PowerShell array concatenation ($cli_opts += '-v') so
$cli_opts remains an array; update the block that sets $env:GGML_SCHED_DEBUG
when $env:SCHED is present to append '-v' to $cli_opts instead of assigning a
combined string.
| try { | ||
| # Download | ||
| Write-Host "Downloading from: $Url" | ||
| Invoke-WebRequest -Uri $Url -OutFile $TempFile | ||
|
|
||
| # Untar | ||
| # Note: We assume Windows includes tar.exe (Win 10 build 17063+) | ||
| Write-Host "Extracting archive to $TargetDir..." | ||
|
|
||
| # We use -C to extract contents INTO the target directory created above | ||
| tar -xJvf $TempFile -C $TargetDir\.. | ||
|
|
||
| Write-Host "Extraction complete." -ForegroundColor Green | ||
| } | ||
| catch { | ||
| Write-Error "Failed to download or extract $Name. Error: $_" | ||
| # Cleanup target dir if failed so script tries again next time | ||
| Remove-Item -Path $TargetDir -Recurse -Force -ErrorAction SilentlyContinue | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error handling allows script to continue after failure.
When Invoke-WebRequest or tar fails, Write-Error is called but the script continues to the environment variable setup section. With $ErrorActionPreference = "Stop", Write-Error actually throws a terminating error, but this is caught by the catch block and doesn't re-throw, so execution proceeds.
Consider re-throwing the exception or tracking failure state to skip environment variable setup for failed SDKs.
🛡️ Proposed fix to re-throw after cleanup
catch {
Write-Error "Failed to download or extract $Name. Error: $_"
# Cleanup target dir if failed so script tries again next time
Remove-Item -Path $TargetDir -Recurse -Force -ErrorAction SilentlyContinue
+ throw
}🤖 Prompt for AI Agents
In `@scripts/snapdragon/windows/setup-build.ps1` around lines 51 - 69, The
try/catch around the download/extract (Invoke-WebRequest / tar) swallows the
original failure after calling Write-Error and cleanup (Remove-Item), allowing
the script to continue into environment setup; modify the catch in
setup-build.ps1 so it either re-throws the caught exception (using "throw")
after cleanup or sets a failure flag (e.g., $DownloadFailed) and then use that
flag to skip the subsequent environment variable setup for that SDK; reference
the existing catch block, Write-Error, Remove-Item, $TempFile and $TargetDir to
implement the chosen approach so failed downloads do not proceed to env var
configuration.
| # Set HEXAGON_SDK_ROOT | ||
| [System.Environment]::SetEnvironmentVariable('HEXAGON_TOOLS_ROOT', $HexagonToolsTarget, [System.EnvironmentVariableTarget]::User) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix misleading comment.
The comment says "Set HEXAGON_SDK_ROOT" but the code sets HEXAGON_TOOLS_ROOT.
📝 Proposed fix
-# Set HEXAGON_SDK_ROOT
+# Set HEXAGON_TOOLS_ROOT
[System.Environment]::SetEnvironmentVariable('HEXAGON_TOOLS_ROOT', $HexagonToolsTarget, [System.EnvironmentVariableTarget]::User)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Set HEXAGON_SDK_ROOT | |
| [System.Environment]::SetEnvironmentVariable('HEXAGON_TOOLS_ROOT', $HexagonToolsTarget, [System.EnvironmentVariableTarget]::User) | |
| # Set HEXAGON_TOOLS_ROOT | |
| [System.Environment]::SetEnvironmentVariable('HEXAGON_TOOLS_ROOT', $HexagonToolsTarget, [System.EnvironmentVariableTarget]::User) |
🤖 Prompt for AI Agents
In `@scripts/snapdragon/windows/setup-build.ps1` around lines 102 - 103, The
comment is wrong: it says "Set HEXAGON_SDK_ROOT" while the code is setting the
HEXAGON_TOOLS_ROOT environment variable; update the comment to accurately
reflect the action (e.g., change the comment to "Set HEXAGON_TOOLS_ROOT") so it
matches the SetEnvironmentVariable call that sets 'HEXAGON_TOOLS_ROOT' in the
script (reference the [System.Environment]::SetEnvironmentVariable call).
|
🟢 Test of 'mac_ai test test_ci' succeeded after 00 hours 25 minutes 52 seconds. 🟢 • Link to the test results. • Link to the reports index. Test configuration: |
Assisted-by-AI: Claude Code, CodeRabbit.ai
Summary by CodeRabbit
Refactor
New Features
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.