Fix leaks #26

kpouget · 2026-01-28T14:52:13Z

Assisted-by-AI: Claude Code, CodeRabbit.ai

Summary by CodeRabbit

Refactor
- Improved GPU caching and thread-safe shared-memory handling for better performance and stability
- Simplified buffer allocation paths and removed legacy buffer-type checks (deprecated)
New Features
- Pluggable speculative decoding framework with multiple strategies (draft models, n‑gram map/cache)
- New n‑gram drafting engine and integration into lookup/example tools
Documentation
- Comprehensive speculative decoding docs and platform/setup guides
Tests
- Updated and expanded template/parser tests (Solar‑Open) and undefined-input cases

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…ng/array) by filters/tests (#19147) * undefined is treated as iterable (string/array) by filters `tojson` is not a supported `undefined` filter * add tests * add sequence and iterable tests keep it DRY and fix some types

openshift-ci · 2026-01-28T14:52:44Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

The syclcompat/math.hpp is not used anymore. The change that intrduced it was successfuly reverted (ggml-org/llama.cpp#17826). This include path will become obsolete and dropped in oneAPI 2026.0 effectively breaking ggml-sycl builds.

* convert : yield Mamba2Model/GraniteMoeModel modify_tensors This commit updates the `GraniteHybridModel` class' modify_tensors function to properly delegate to `Mamba2Model.modify_tensors` and `GraniteMoeModel.modify_tensors` using 'yield from' instead of 'return'. The motivation for this is that modify_tensors is a generator function (it uses 'yield from'), but the two calls above use return statements but don't yield anything which means that the the caller of this function will not receive any yielded values from it. And this causes layer tensors to be silently dropped during conversion.

kpouget · 2026-01-28T16:44:51Z

/test topsail remoting_mac

topsail-bot · 2026-01-28T16:51:46Z

🔴 Test of 'mac_ai test prepare_ci' failed after 00 hours 02 minutes 39 seconds. 🔴

• Link to the test results.

• No reports index generated...

Test configuration:

PR_POSITIONAL_ARGS: topsail
PR_POSITIONAL_ARG_0: topsail
PR_POSITIONAL_ARG_1: remoting_mac

Failure indicator: Empty. (See run.log)

kpouget · 2026-01-28T16:53:50Z

/test topsail mac-m4 remoting_mac

topsail-bot · 2026-01-28T17:01:25Z

🔴 Test of 'mac_ai test prepare_ci' failed after 00 hours 02 minutes 38 seconds. 🔴

• Link to the test results.

• No reports index generated...

Test configuration:

PR_POSITIONAL_ARGS: topsail
PR_POSITIONAL_ARG_0: topsail
PR_POSITIONAL_ARG_1: mac-m4
PR_POSITIONAL_ARG_2: remoting_mac

Failure indicator: Empty. (See run.log)

kpouget · 2026-01-28T17:15:25Z

/test topsail mac-m4 remoting_mac
/var cluster.name: mac

…ctor (#18471) * server: introduce self-speculative decoding * server: moved self-call into speculative.cpp * can_speculate() includes self-speculation Co-authored-by: Georgi Gerganov <[email protected]> * server: can_speculate() tests self-spec * server: replace can_speculate() with slot.can_speculate() Co-authored-by: Sigbjørn Skjæret <[email protected]> * common: use %zu format specifier for size_t in logging Co-authored-by: Sigbjørn Skjæret <[email protected]> * server: can_speculate() requires a task instance * common: ngram map, config self-speculative decoding * common: add enum common_speculative_type * common: add vector of speculative states * common: add option --spec-draftless * server: cleanup (remove slot.batch_spec, rename) * common: moved self-spec impl to ngram-map * common: cleanup (use common_speculative_state_draft) * spec : refactor * cont : naming * spec: remove --spec-config * doc: (draftless) speculative decoding * common: print performance in spec decoding * minor : cleanup * common : better names * minor : cleanup + fix build * minor: comments * CODEOWNERS: add common/ngram-map.* (#18471) * common : rename speculative.draftless_type -> speculative.type * ngram-map : fix uninitialized values * ngram-map : take into account the input can become shorter * ngram-map : revert len check for now * arg : change `--spec-draftless` -> `--spec-type` * spec : add common_speculative_state::accept() * spec : refactor + add common_speculative_begin() * spec : fix begin() call with mtmd * spec : additional refactor + remove common_speculative_params --------- Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]>

topsail-bot · 2026-01-28T17:43:27Z

🔴 Test of 'mac_ai test prepare_ci' failed after 00 hours 23 minutes 01 seconds. 🔴

• Link to the test results.

• No reports index generated...

Test configuration:

PR_POSITIONAL_ARGS: topsail
PR_POSITIONAL_ARG_0: topsail
PR_POSITIONAL_ARG_1: mac-m4
PR_POSITIONAL_ARG_2: remoting_mac

Failure indicator:

/tmp/topsail_202601281769620823/016__build_llama_cpp_remoting/FAILURE | RuntimeError: Failed to build llama-cpp/remoting. See /tmp/topsail_202601281769620823/016__build_llama_cpp_remoting/build.log
Traceback (most recent call last):
  File "/opt/topsail/src/projects/mac_ai/testing/prepare_llama_cpp.py", line 347, in prepare_from_source
    raise RuntimeError(f"Failed to build llama-cpp/{inference_server_flavor}. See {env.ARTIFACT_DIR}/build.log")
RuntimeError: Failed to build llama-cpp/remoting. See /tmp/topsail_202601281769620823/016__build_llama_cpp_remoting/build.log

* vulkan: use coopmat for flash attention p*v matrix multiplication * fix P loading issue * fix barrier position * remove reduction that is no longer needed * move max thread reduction into loop * remove osh padding * add bounds checks and padding * remove unused code * fix shmem sizes, loop duration and accesses * don't overwrite Qf, add new shared psh buffer instead * add missing bounds checks * use subgroup reductions * optimize * move bounds check, reduce barriers * support other Bc values and other subgroup sizes * remove D_split * replace Of register array with shared memory Ofsh array * parallelize HSV across the rowgroups * go back to Of in registers, not shmem * vectorize sfsh * don't store entire K tile in shmem * fixes * load large k tiles to shmem on Nvidia * adapt shared memory host check function to shader changes * remove Bc 32 case * remove unused variable * fix missing mask reduction tmspsh barrier * fix mask bounds check * fix rowmax f16 under/overflow to inf * fix flash_attn_cm2 BLOCK_SIZE preprocessor directives

coderabbitai · 2026-01-28T20:34:42Z

Walkthrough

Caches device and buffer-type metadata in virtgpu, migrates buffer-type APIs to use host handles, deprecates buffer_type.is_host, introduces thread-safe shared-memory locking, updates APIR encode/decode helpers, and centralizes forward/backend buffer operations around apir_buffer_type_host_handle_t.

Changes

Cohort / File(s)	Summary
Remoting dispatch & deprecation `ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer-type.cpp`, `ggml/src/ggml-virtgpu/backend/backend-dispatched.gen.h`, `ggml/src/ggml-virtgpu/regenerate_remoting.py`	Guarded calls to get_max_size (defaults to SIZE_MAX if handler absent); marked buffer_type.is_host deprecated and kept handler returning false; generator emits deprecation comments and "(DEPRECATED)" labels in dispatch table.
Frontend API / Signatures `ggml/src/ggml-virtgpu/virtgpu-forward.gen.h`, `ggml/src/ggml-virtgpu/ggmlremoting_functions.yaml`, `ggml/src/ggml-virtgpu/virtgpu-forward-buffer-type.cpp`, `ggml/src/ggml-virtgpu/virtgpu-forward-buffer-type.cpp`	Replaced `ggml_backend_buffer_type_t` parameters with `apir_buffer_type_host_handle_t host_handle`; changed several `const char ` returns to `char `; removed `apir_buffer_type_is_host` from frontend.
virtgpu struct & caching `ggml/src/ggml-virtgpu/virtgpu.h`, `ggml/src/ggml-virtgpu/ggml-backend-reg.cpp`, `ggml/src/ggml-virtgpu/virtgpu.cpp`, `ggml/src/ggml-virtgpu/ggml-backend-device.cpp`	Added `data_shmem_mutex`, `cached_device_info`, `cached_buffer_type` to `virtgpu`; implemented one-time thread-safe initialization and thread-local encoder buffers; replaced per-call queries with cached reads.
Buffer-type forwarding & callers `ggml/src/ggml-virtgpu/ggml-backend-buffer-type.cpp`, `ggml/src/ggml-virtgpu/virtgpu-forward-buffer.cpp`, `ggml/src/ggml-virtgpu/virtgpu-forward-buffer-type.cpp`, `ggml/src/ggml-virtgpu/virtgpu-forward-device.cpp`	Forwarding and backend buffer ops now encode/decode host handles and use gpu->cached_buffer_type fields for name, alignment, max_size, and allocation paths; removed legacy is_host logic and static caching layers.
Shared memory locking & resource paths `ggml/src/ggml-virtgpu/virtgpu-forward-buffer.cpp`, `ggml/src/ggml-virtgpu/virtgpu-forward-buffer-type.cpp`	Added using_shared_shmem flag and locked `gpu->data_shmem_mutex` when using preallocated shared mmap; cleanup distinguishes locked shared shmem vs temporary shmem.
APIR helpers & decoders `ggml/src/ggml-virtgpu/backend/shared/apir_cs_ggml.h`	Added null-check in ggml tensor decoder to return NULL for missing rpc tensor; added `apir_encode_apir_buffer_type_host_handle` helper for encoding host_handle.
Generator adjustments `ggml/src/ggml-virtgpu/regenerate_remoting.py`, `ggml/src/ggml-virtgpu/backend/backend-dispatched.gen.h`	Generator now marks deprecated functions (replacing newly_added), emits deprecation comments, appends " (DEPRECATED)" to names, and omits frontend generation for deprecated functions.
Minor call-site & encoding fixes various forward/backend files under `ggml/src/ggml-virtgpu/...`	Replaced apir_encode_ggml_buffer_type with apir_encode_apir_buffer_type_host_handle; added null-guards and default encodings when handlers are absent; adjusted remote_call result decoding for string returns.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Forward as VirtGPU-Forward
    participant Backend as APIR-Backend
    participant GPU as virtgpu

    Client->>Forward: apir_buffer_type_get_max_size(host_handle)
    Forward->>Backend: encode host_handle, send command
    Backend->>GPU: check gpu->cached_buffer_type.host_handle / iface
    alt GPU iface has get_max_size
        GPU->>Backend: call iface.get_max_size(...) -> max_size
    else GPU missing handler
        GPU->>Backend: return SIZE_MAX
    end
    Backend->>Forward: encode result (max_size)
    Forward->>Client: return max_size

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

[remoting] Rebase on top of b6945 #15 — Overlaps remoting buffer-type host-handle migration and buffer-type handler changes.
remoting: improve the frontend<>backend error handling #7 — Related APIR/remoting enum and dispatch modifications referenced by deprecation handling.
Rebase on top of b7003 and make it work for Linux #17 — Continues host-handle and forwarding adjustments affecting the same dispatch/forward code paths.

Poem

🐰
I tunneled through headers, saved names in a cache,
Host-handles snug while mutexes latch.
No more per-call hunts, just a single, calm glance—
Shared memory locked, the buffers can dance.
A tiny hop for code, a safer remoting prance.

🚥 Pre-merge checks | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is minimal ('Assisted-by-AI: Claude Code, CodeRabbit.ai') and lacks substantive detail about the changes, objectives, testing, or rationale. It does not follow the template structure provided in the repository, which requires details about the contribution.	Add a comprehensive description following the repository's template, including: summary of changes, motivation, testing performed, and any breaking changes or migration steps required.
Docstring Coverage	⚠️ Warning	Docstring coverage is 8.85% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The PR title 'Fix leaks' is vague and does not clearly convey the primary changes in the changeset. The raw summary indicates extensive refactoring across multiple subsystems (virtgpu backend, speculative decoding, CUDA/Vulkan/HIP/SYCL backends, template system, n-gram caching, etc.), but the title provides no meaningful information about the scope or nature of these changes.	Replace 'Fix leaks' with a more descriptive title that captures the main changes, such as 'Refactor speculative decoding and backend caching' or break the PR into smaller, more focused changes with specific titles.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

kpouget · 2026-01-28T20:36:03Z

/test topsail mac-m4 remoting_mac
/var cluster.name: mac

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

ggml/src/ggml-virtgpu/virtgpu.cpp (1)
484-487: Change response_dec to thread_local to avoid data races in concurrent calls.

The response_dec variable at line 484 is static while encoder_buffer and enc are already thread_local. If multiple threads call remote_call concurrently, they will share the same decoder object, causing one thread's initialization to overwrite another's before use.
Proposed fix
-    static apir_decoder response_dec;
+    thread_local apir_decoder response_dec;
ggml/src/ggml-virtgpu/virtgpu-forward-buffer-type.cpp (1)

3-25: Confirm memory leak in apir_buffer_type_get_name — allocated string is never freed.

The function allocates memory via apir_decoder_alloc_array() at line 15 and returns a pointer to this allocated string. In ggml-backend-reg.cpp line 36, this pointer is stored in gpu->cached_buffer_type.name, which persists as a static structure for the program's lifetime. Since virtgpu is a plain C struct with no destructor, and no cleanup code exists to free cached_buffer_type.name, this results in a memory leak whenever the backend is initialized.

🤖 Fix all issues with AI agents

In `@ggml/src/ggml-virtgpu/backend/shared/apir_backend.gen.h`:
- Around line 19-34: The enum numeric IDs for APIR_COMMAND_TYPE_* have been
shifted which will break wire protocol clients; restore stability by reserving
any removed numeric slot(s) instead of renumbering (e.g., add a placeholder like
APIR_COMMAND_TYPE_RESERVED_x for the missing value(s)) or, alternatively, bump
and enforce a protocol version/handshake; update the enum (symbols such as
APIR_COMMAND_TYPE_BUFFER_TYPE_ALLOC_BUFFER, APIR_COMMAND_TYPE_BUFFER_GET_BASE,
APIR_COMMAND_TYPE_BACKEND_GRAPH_COMPUTE) to keep existing numeric values
unchanged and adjust APIR_BACKEND_DISPATCH_TABLE_COUNT accordingly to reflect
the true count while preserving historic IDs.

In `@ggml/src/ggml-virtgpu/ggml-backend-device.cpp`:
- Around line 98-116: Replace the non-atomic initialized flag with a
std::atomic<bool> and use proper acquire/release semantics to fix the
double-checked locking in the block that constructs the static
ggml_backend_buffer_type buft; specifically change the static bool initialized
to static std::atomic<bool> initialized{false}, read it with
initialized.load(std::memory_order_acquire) for the first check, take the
std::mutex and construct/assign buft as now, then set initialized.store(true,
std::memory_order_release) after initialization; keep the existing mutex,
lock_guard and the buft construction (iface/device/context) logic unchanged.
- Around line 75-92: The code uses a non-atomic static bool initialized and a
static ggml_backend_buffer_type buft, creating a data race in the double-checked
locking and incorrectly binding the first-call dev into a single static buft;
fix by making initialized an std::atomic<bool> (e.g. std::atomic<bool>
initialized{false}) and use appropriate memory_order (load with
memory_order_acquire, store with memory_order_release) around the
mutex-protected initialization; additionally avoid storing a single static buft
per-process when different dev values may be passed—either make buft non-static
(return a fresh ggml_backend_buffer_type) or maintain a small thread-safe map
keyed by dev to return a distinct buft for each dev (referencing the symbols
initialized, buft, dev, and the mutex lock/initialization block).

In `@ggml/src/ggml-virtgpu/ggml-backend-reg.cpp`:
- Around line 34-38: apir_buffer_type_get_name may return NULL on allocation
failure, so update the code that sets gpu->cached_buffer_type.name to check the
return value from apir_buffer_type_get_name(gpu, buft_host_handle) and use a
safe fallback (e.g., an empty string or strdup(\"\")) when NULL is returned;
ensure the chosen fallback matches how other device info strings are handled and
that ownership/cleanup of gpu->cached_buffer_type.name remains consistent with
the rest of the code (refer to apir_buffer_type_get_name and
gpu->cached_buffer_type.name to locate the change).
- Around line 24-38: apir_device_get_description and apir_device_get_name can
return NULL; update the cache population in the block that assigns
gpu->cached_device_info.description and .name to check the returned pointers
from apir_device_get_description(gpu) and apir_device_get_name(gpu) and, if
NULL, set gpu->cached_device_info.description/name to a safe fallback (e.g. an
empty string or strdup of "(unknown)") instead of storing NULL; ensure any
temporary allocated strings are handled consistently with existing ownership
semantics used elsewhere for cached_device_info to avoid leaks.

In `@ggml/src/ggml-virtgpu/virtgpu.h`:
- Around line 76-78: The data_shmem_mutex initialized via mtx_init in
apir_initialize/virtgpu.cpp is never destroyed; add a call to
mtx_destroy(&gpu->data_shmem_mutex) in the GPU cleanup path (the function that
frees/destroys the GPU object created in apir_initialize) so the mutex is
properly released during GPU teardown; ensure the mtx_destroy call is paired
with the existing initialization and placed before or as part of releasing the
GPU structure.

🧹 Nitpick comments (3)

ggml/src/ggml-virtgpu/ggml-remoting.h (1)
48-51: Misleading comment for frontend helper.

The comment on line 49 says "in the backend, the buffer handle is the buffer pointer" but this is the frontend version that extracts buft->context. The backend version in backend/backend-convert.h casts buft directly. Consider updating the comment to reflect the frontend semantics:
 static inline apir_buffer_type_host_handle_t ggml_buffer_type_to_apir_handle(ggml_backend_buffer_type_t buft) {
-    // in the backend, the buffer handle is the buffer pointer
+    // in the frontend, the host handle is stored in buft->context
     return (apir_buffer_type_host_handle_t) buft->context;
 }
ggml/src/ggml-virtgpu/virtgpu-forward-backend.cpp (1)
50-55: Consider checking mtx_unlock return value and handling error paths.

The mtx_unlock return value is not checked. While unlock failures are rare, logging or asserting could help diagnose issues.

More importantly, if an error occurs between acquiring the lock (line 25) and releasing it (line 52), such as if REMOTE_CALL macro causes an early return or abort, the mutex could remain locked. Consider using RAII-style lock guards if C++ mutex primitives are available.
♻️ Optional: Use RAII lock guard pattern

If mixing C++ and C threading primitives is acceptable, consider wrapping the C mutex in a simple RAII guard:
// Simple RAII wrapper for mtx_t
struct MtxGuard {
    mtx_t * mtx;
    bool locked;
    MtxGuard(mtx_t * m) : mtx(m), locked(false) {
        if (mtx_lock(mtx) == thrd_success) locked = true;
    }
    ~MtxGuard() { if (locked) mtx_unlock(mtx); }
};
ggml/src/ggml-virtgpu/virtgpu.h (1)

80-88: Cached memory values may become stale.

memory_free and memory_total are cached at initialization but may change during runtime as allocations occur. If accurate memory reporting is required, consider whether these should remain dynamic queries or document that they represent initial values.

ggml/src/ggml-virtgpu/backend/shared/apir_backend.gen.h

ggml/src/ggml-virtgpu/ggml-backend-device.cpp

ggml/src/ggml-virtgpu/ggml-backend-reg.cpp

ggml/src/ggml-virtgpu/virtgpu.h

topsail-bot · 2026-01-28T21:43:38Z

🟢 Test of 'mac_ai test test_ci' succeeded after 00 hours 26 minutes 24 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: topsail
PR_POSITIONAL_ARG_0: topsail
PR_POSITIONAL_ARG_1: mac-m4
PR_POSITIONAL_ARG_2: remoting_mac

…t to support more cases (#19154) Co-authored-by: Neo Zhang Jianyu <[email protected]>

…) (#19126)

…#19159)

kpouget · 2026-01-29T09:03:02Z

/test topsail mac-m4 remoting_mac
/var cluster.name: mac

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

ggml/src/ggml-virtgpu/regenerate_remoting.py (1)

299-301: Missing newline or space in warning message.

The warning message is missing proper separation between the two sentences.

🐛 Proposed fix

         if not self.clang_format_available:
-            logging.warning("\n⚠️clang-format not found in PATH. Generated files will not be formatted."
-                            "   Install clang-format to enable automatic code formatting.")
+            logging.warning("\n⚠️ clang-format not found in PATH. Generated files will not be formatted.\n"
+                            "   Install clang-format to enable automatic code formatting.")

🧹 Nitpick comments (5)

ggml/src/ggml-virtgpu/backend/shared/apir_cs_ggml.h (1)
40-57: Consider moving the null check before initializing ggml_init_params.

The null guard at lines 48-50 is a good addition to handle cases where apir_rpc_tensor is NULL. However, the ggml_init_params struct is constructed before the check. While this is harmless (stack allocation), moving the check earlier would be slightly cleaner.
♻️ Proposed refactor
 static inline const ggml_tensor * apir_decode_ggml_tensor(apir_decoder * dec) {
     const apir_rpc_tensor * apir_rpc_tensor = apir_decode_apir_rpc_tensor_inplace(dec);
+
+    if (!apir_rpc_tensor) {
+        return NULL;
+    }
+
     ggml_init_params params{
         /*.mem_size   =*/ ggml_tensor_overhead(),
         /*.mem_buffer =*/ NULL,
         /*.no_alloc   =*/ true,
     };

-    if (!apir_rpc_tensor) {
-        return NULL;
-    }
-
     ggml_context * ctx = ggml_init(params);
ggml/src/ggml-virtgpu/virtgpu-forward-device.cpp (2)
24-43: Return type change transfers ownership to caller - document this.

The change from const char * to char * indicates the caller now owns the returned memory and is responsible for freeing it. This aligns with the PR objective to fix leaks.

Consider adding a documentation comment to clarify the ownership semantics:
📝 Proposed documentation
+/**
+ * Returns the device name. Caller takes ownership and must free() the returned string.
+ * Returns NULL on allocation failure.
+ */
 char * apir_device_get_name(virtgpu * gpu) {
45-66: Same ownership semantics apply here.

The description getter has the same ownership transfer. Consider adding similar documentation.
📝 Proposed documentation
+/**
+ * Returns the device description. Caller takes ownership and must free() the returned string.
+ * Returns NULL on allocation failure.
+ */
 char * apir_device_get_description(virtgpu * gpu) {
ggml/src/ggml-virtgpu/ggml-backend-reg.cpp (1)
176-183: Make cleanup idempotent to avoid accidental double-free.
If ggml_virtgpu_cleanup can be called more than once (or on a partially-initialized GPU), guard and null out freed pointers.
♻️ Suggested hardening
void ggml_virtgpu_cleanup(virtgpu *gpu) {
+    if (!gpu) {
+        return;
+    }
-    free(gpu->cached_device_info.name);
-    free(gpu->cached_device_info.description);
-    free(gpu->cached_buffer_type.name);
+    if (gpu->cached_device_info.name) {
+        free(gpu->cached_device_info.name);
+        gpu->cached_device_info.name = NULL;
+    }
+    if (gpu->cached_device_info.description) {
+        free(gpu->cached_device_info.description);
+        gpu->cached_device_info.description = NULL;
+    }
+    if (gpu->cached_buffer_type.name) {
+        free(gpu->cached_buffer_type.name);
+        gpu->cached_buffer_type.name = NULL;
+    }

     mtx_destroy(&gpu->data_shmem_mutex);
}
ggml/src/ggml-virtgpu/virtgpu-forward.gen.h (1)

6-8: Document ownership for returned char * strings.
With non-const return types, callers need to know they must free the returned buffers. A brief header comment would prevent leaks.

Also applies to: 20-28

topsail-bot · 2026-01-29T10:04:21Z

🟢 Test of 'mac_ai test test_ci' succeeded after 00 hours 26 minutes 19 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: topsail
PR_POSITIONAL_ARG_0: topsail
PR_POSITIONAL_ARG_1: mac-m4
PR_POSITIONAL_ARG_2: remoting_mac

kpouget · 2026-01-29T10:06:48Z

/test topsail kpouget-atuona remoting_linux
/var cluster.name: atuona

* refactor mmf rows_per_block * speed up compile * pass cdna compile * fix cuda error * clean up mmf * f32 mmf * clean float mma * fix mmf error * faster mmf * extend tile k * fix compile error * Revert "extend tile k" This reverts commit 4d2ef3d483932659801a59a5af0b6b48f6ffd5c7. * fix smem overflow * speed up compiling mmf * speed up compile for hip * 512 block for cdna * config pad size * fix as comment * update select logic * move some code to cuh * fix as comment * correct cdna3 config --------- Co-authored-by: zhang hui <[email protected]>

topsail-bot · 2026-01-29T13:13:06Z

🔴 Test of 'mac_ai test test_ci' failed after 02 hours 28 minutes 59 seconds. 🔴

• Link to the test results.

• No reports index generated...

Test configuration:

PR_POSITIONAL_ARGS: topsail
PR_POSITIONAL_ARG_0: topsail
PR_POSITIONAL_ARG_1: kpouget-atuona
PR_POSITIONAL_ARG_2: remoting_linux

Failure indicator: Empty. (See run.log)

* chat : add parsing for solar-open-100b * add comments to rules * cont : make assistant start optional * cont : remove assistant start prefix altogether --------- Co-authored-by: Piotr Wilkin (ilintar) <[email protected]>

topsail-bot · 2026-01-29T15:27:36Z

🟢 Test of 'mac_ai test test_ci' succeeded after 00 hours 25 minutes 49 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: topsail
PR_POSITIONAL_ARG_0: topsail
PR_POSITIONAL_ARG_1: mac-m4
PR_POSITIONAL_ARG_2: remoting_mac

kpouget · 2026-01-29T15:30:15Z

/test topsail mac-m4 remoting_mac
/var cluster.name: mac

coderabbitai

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/test-chat-template.cpp (1)
81-82: Bug: --no-common flag sets use_common to wrong value.

The --no-common flag should set use_common = false, but it currently sets it to true. This means the flag has no effect (since use_common is already initialized to true on line 67).
🐛 Proposed fix
         } else if (args[i] == "--no-common") {
-            use_common = true;
+            use_common = false;

🤖 Fix all issues with AI agents

In @.github/workflows/winget.yml:
- Around line 31-35: The current destructuring of the found release/asset can
throw if no release or no asset includes 'win-vulkan'; update the logic around
the releases.find call (the variable currently destructured into version and
assets) to first store the foundRelease, guard that it is not undefined, then
find the matching asset (the assets.find that yields asset_url) and guard that
as well; if either is missing, call core.setFailed or throw a clear Error with a
message like "No release with win-vulkan asset found" so the workflow fails fast
with a readable message instead of crashing during destructuring.

In `@common/arg.cpp`:
- Around line 3398-3417: The CLI option "--spec-type" accepts only hyphenated
values but common_speculative_type_to_str and the internal enum/constants
(COMMON_SPECULATIVE_TYPE_NONE, COMMON_SPECULATIVE_TYPE_NGRAM_CACHE,
COMMON_SPECULATIVE_TYPE_NGRAM_SIMPLE, COMMON_SPECULATIVE_TYPE_NGRAM_MAP_K,
COMMON_SPECULATIVE_TYPE_NGRAM_MAP_K4V) use underscore names, causing defaults
shown by common_speculative_type_to_str to be non-pastable; update the lambda
passed to common_arg/add_opt for "--spec-type" to normalize input by mapping
both underscore and hyphen variants (e.g., accept "ngram_map_k" and
"ngram-map-k" or translate underscores to hyphens before comparison) so
params.speculative.type is set correctly and help/default strings can be reused
by the CLI.

In `@common/common.h`:
- Around line 255-299: The has_dft() helper only checks mparams_dft.path and
mparams_dft.hf_repo and thus misses other draft-model entry points; update
has_dft() to also return true when a draft model is provided via
mparams_dft.url, mparams_dft.hf_file, mparams_dft.docker_repo, mparams_dft.name
(or any other identifying fields on common_params_model), and when model_dft !=
nullptr; locate the function has_dft() in struct common_params_speculative and
add checks for these additional mparams_dft members and the model_dft pointer so
any of those sources signal a present draft model.

In `@common/ngram-map.cpp`:
- Around line 195-211: When creating a simple key-only draft in the block
guarded by if (map.key_only) (where draft is filled and map.last_draft_key_idx /
map.last_draft_value_idx are set), also set map.last_draft_created = true so
that common_ngram_map_accept sees the draft and updates n_accepted; update the
assignment that currently sets last_draft_created to false to true and keep the
other last_draft_* assignments unchanged.
- Around line 314-329: The draft is currently extracted from match_pos while
slot_max may reference a different value; update the draft extraction in the
block that computes n_draft_tokens/draft so it uses the selected value's
position (e.g., curr_key.values[slot_max].value_idx or equivalent) instead of
match_pos, and guard against a zero/invalid value_idx before indexing inp; also
ensure map.last_draft_value_idx and n_accepted
(curr_key.values[slot_max].n_accepted) then correctly describe the produced
draft (adjust match offset computation and bounds checks accordingly).

In `@common/speculative.cpp`:
- Around line 247-260: Validate the result of re-tokenizing `text` before
assigning back to `id_last`: after calling common_tokenize(ctx_dft, text, false,
true) (used to remap the detokenized `id_last`), check the returned vector size
and only set `id_last` from the single token when the size == 1; if the result
is empty or contains >1 tokens, log a debug/warn via LOG_DBG/LOG_WARN and skip
drafting/keep the original `id_last` (or handle per existing draft-skip logic)
to avoid out-of-bounds access and incorrect multi-token remaps; update the block
around llama_detokenize, text = replace_to_dft(text), and the subsequent
assignment to `id_last` accordingly.

In `@docs/speculative.md`:
- Around line 55-105: Add fenced-code languages and fix table style and
hyphenation: update the code fences that show CLI options to use a language tag
(e.g., ```text) and ensure closing fences are present, convert the options table
to the GitHub-style pipe table with header separator (|Type|Description| and
|---|---|) replacing the mismatched style, and change occurrences of "map based"
to "map-based" in the option descriptions for --spec-ngram-size-n and
--spec-ngram-size-m (and the duplicate text around lines 109-113) so
markdownlint MD040/MD060 and the hyphenation issue are resolved.

In `@ggml/src/ggml-cuda/ggml-cuda.cu`:
- Around line 3083-3089: The ggml_cuda_topk_moe_fusion function leaves
args.scale uninitialized causing nondeterministic fusion paths; explicitly set
args.scale (a member of ggml_cuda_topk_moe_args) when initializing the args
struct inside ggml_cuda_topk_moe_fusion so it has a defined value (e.g., false
or 1.0 as appropriate for your semantics) to prevent mis-detection of a SCALE op
and subsequent out-of-bounds node indexing.

In `@ggml/src/ggml-virtgpu/backend/shared/apir_cs_ggml.h`:
- Around line 40-45: apir_decode_ggml_tensor can return NULL so update each call
site—backend_buffer_set_tensor, backend_buffer_get_tensor, and
backend_buffer_cpy_tensor—to check the returned const ggml_tensor* immediately
after the apir_decode_ggml_tensor(...) call and handle NULL by logging an error
and returning early with an appropriate error code (or return value) instead of
dereferencing; in backend_buffer_cpy_tensor check both src and dst results
separately and return on either NULL to avoid use-after-null.

In `@ggml/src/ggml-virtgpu/ggml-backend-reg.cpp`:
- Around line 4-12: The file uses std::atomic<bool> in the apir_initialize
function but doesn't include <atomic>; add an explicit `#include` <atomic> at the
top of ggml-backend-reg.cpp so the use of std::atomic in apir_initialize is
valid without relying on transitive includes, keeping the include near the other
standard headers (e.g., alongside <mutex> and <iostream>).

In `@ggml/src/ggml-virtgpu/ggmlremoting_functions.yaml`:
- Around line 26-31: The functions get_name and get_description currently return
dynamically allocated char * via apir_decoder_alloc_array() but lack
ownership/lifetime documentation; either (a) add clear comments/doxygen to the
generated declarations for get_name and get_description stating that the caller
receives ownership and must free the buffer (specify free function if any), or
(b) change the API to avoid per-call allocation by returning const char * to a
cached/static buffer (or expose a separate function to obtain/duplicate the
string) and update uses of apir_decoder_alloc_array() accordingly; locate
get_name/get_description in the YAML and the allocation site
apir_decoder_alloc_array() to implement one of these two fixes.

In `@tools/server/server-context.cpp`:
- Around line 749-759: When common_speculative_init(params_base.speculative,
slot.ctx) returns a non-null slot.spec while mctx is set, the function returns
early and leaks that allocated spec; modify the error path in the block around
common_speculative_init so that before returning false you free or destroy the
newly created slot.spec (e.g., call the corresponding free/destroy function used
for speculative contexts) or move the slot into the slots container so destroy()
will clean it up; update the early-return path where SRV_ERR is logged to ensure
slot.spec is properly released to avoid orphaned memory (referencing slot.spec,
common_speculative_init, mctx, slots, and destroy()).

In `@tools/server/server-task.cpp`:
- Around line 256-266: Fix two issues in the speculative parameter parsing: (1)
correct the clamping logic for ngram_size_n, ngram_size_m, ngram_check_rate, and
ngram_min_hits by using std::min(std::max(1, x), 1024) instead of the current
std::max(std::min(1, x), 1024) so values are clamped into [1,1024]; (2) handle
unknown speculative type returned by common_speculative_type_from_name by
checking if the result equals COMMON_SPECULATIVE_TYPE_COUNT and, if so, set
params.speculative.type to defaults.speculative.type (or otherwise log/validate)
instead of assigning the invalid enum directly. Ensure you update the code
around params.speculative.* assignments and the call to
common_speculative_type_from_name accordingly.

🧹 Nitpick comments (1)

examples/lookup/lookup.cpp (1)
213-213: Consider guarding the save operation when dynamic cache path is empty.

If params.speculative.lookup_cache_dynamic is empty, common_ngram_cache_save will be called with an empty filename. Depending on common_ngram_cache_save's behavior, this could create an empty-named file or fail silently.
♻️ Suggested guard
     // Update dynamic ngram cache with context ngram cache and save it to disk:
     common_ngram_cache_merge(ngram_cache_dynamic, ngram_cache_context);
-    common_ngram_cache_save(ngram_cache_dynamic, params.speculative.lookup_cache_dynamic);
+    if (!params.speculative.lookup_cache_dynamic.empty()) {
+        common_ngram_cache_save(ngram_cache_dynamic, params.speculative.lookup_cache_dynamic);
+    }

.github/workflows/winget.yml

common/arg.cpp

common/common.h

coderabbitai · 2026-01-29T15:54:48Z

common/ngram-map.cpp

+    if (map.key_only) {
+        // simple mode:
+        // Fill in the draft with the m tokens following the key.
+        // We work with value values[0] only.
+        int n_draft_tokens = std::min((int) m, (int) curr_key.values[0].n_accepted);
+
+        for (int i = 0; i < n_draft_tokens; ++i) {
+            draft.push_back(inp[match_pos + n + i]);
+        }
+
+        LOG_INF("%s: key_offset = %zu, key_num = %d, draft.size = %zu\n", __func__,
+                key_offset, curr_key.key_num, draft.size());
+
+        map.last_draft_created   = false;
+        map.last_draft_key_idx   = key_offset;
+        map.last_draft_value_idx = 0; // value 0 is used for simple mode
+        return;


⚠️ Potential issue | 🟠 Major

Key‑only drafts aren’t marked as created.

map.last_draft_created stays false even when a draft is produced, so common_ngram_map_accept will skip updates and n_accepted never adapts.

🔧 Proposed fix

- map.last_draft_created = false; + map.last_draft_created = true;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (map.key_only) {

// simple mode:

// Fill in the draft with the m tokens following the key.

// We work with value values[0] only.

int n_draft_tokens = std::min((int) m, (int) curr_key.values[0].n_accepted);

for (int i = 0; i < n_draft_tokens; ++i) {

draft.push_back(inp[match_pos + n + i]);

}

LOG_INF("%s: key_offset = %zu, key_num = %d, draft.size = %zu\n", __func__,

key_offset, curr_key.key_num, draft.size());

map.last_draft_created = false;

map.last_draft_key_idx = key_offset;

map.last_draft_value_idx = 0; // value 0 is used for simple mode

return;

if (map.key_only) {

// simple mode:

// Fill in the draft with the m tokens following the key.

// We work with value values[0] only.

int n_draft_tokens = std::min((int) m, (int) curr_key.values[0].n_accepted);

for (int i = 0; i < n_draft_tokens; ++i) {

draft.push_back(inp[match_pos + n + i]);

}

LOG_INF("%s: key_offset = %zu, key_num = %d, draft.size = %zu\n", __func__,

key_offset, curr_key.key_num, draft.size());

map.last_draft_created = true;

map.last_draft_key_idx = key_offset;

map.last_draft_value_idx = 0; // value 0 is used for simple mode

return;

🤖 Prompt for AI Agents

In `@common/ngram-map.cpp` around lines 195 - 211, When creating a simple key-only draft in the block guarded by if (map.key_only) (where draft is filled and map.last_draft_key_idx / map.last_draft_value_idx are set), also set map.last_draft_created = true so that common_ngram_map_accept sees the draft and updates n_accepted; update the assignment that currently sets last_draft_created to false to true and keep the other last_draft_* assignments unchanged.

common/ngram-map.cpp

coderabbitai · 2026-01-29T15:54:48Z

ggml/src/ggml-virtgpu/backend/shared/apir_cs_ggml.h

 static inline const ggml_tensor * apir_decode_ggml_tensor(apir_decoder * dec) {
    const apir_rpc_tensor * apir_rpc_tensor = apir_decode_apir_rpc_tensor_inplace(dec);
+
+    if (!apir_rpc_tensor) {
+        return NULL;
+    }


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find all call sites of apir_decode_ggml_tensor and show context rg -n 'apir_decode_ggml_tensor\s*\(' -A 5 -B 2

Repository: crc-org/llama.cpp

Length of output: 3114

🏁 Script executed:

#!/bin/bash # Get the function names and their full context rg -n 'apir_decode_ggml_tensor\s*\(' -B 15 backend-dispatched-buffer.cpp | head -100

Repository: crc-org/llama.cpp

Length of output: 130

🏁 Script executed:

#!/bin/bash # Get the function names and their full context rg -n 'apir_decode_ggml_tensor\s*\(' -B 15 ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer.cpp | head -120

Repository: crc-org/llama.cpp

Length of output: 1657

🏁 Script executed:

#!/bin/bash # Get complete functions to see all tensor usages sed -n '20,50p' ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer.cpp sed -n '52,81p' ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer.cpp sed -n '83,100p' ggml/src/ggml-virtgpu/backend/backend-dispatched-buffer.cpp

Repository: crc-org/llama.cpp

Length of output: 2289

Add NULL checks for all apir_decode_ggml_tensor call sites before dereferencing.

The function now can return NULL (line 43-44 of apir_cs_ggml.h), but three call sites in backend-dispatched-buffer.cpp dereference the result without guards:

backend_buffer_set_tensor (line 29): tensor used at line 47

backend_buffer_get_tensor (line 61): tensor used at line 78

backend_buffer_cpy_tensor (lines 91-92): src and dst both used at line 94

Add NULL checks after each apir_decode_ggml_tensor call to return an error code or log and return early, preventing NULL pointer dereferences.

🤖 Prompt for AI Agents

In `@ggml/src/ggml-virtgpu/backend/shared/apir_cs_ggml.h` around lines 40 - 45, apir_decode_ggml_tensor can return NULL so update each call site—backend_buffer_set_tensor, backend_buffer_get_tensor, and backend_buffer_cpy_tensor—to check the returned const ggml_tensor* immediately after the apir_decode_ggml_tensor(...) call and handle NULL by logging an error and returning early with an appropriate error code (or return value) instead of dereferencing; in backend_buffer_cpy_tensor check both src and dst results separately and return on either NULL to avoid use-after-null.

coderabbitai · 2026-01-29T15:54:48Z

ggml/src/ggml-virtgpu/ggml-backend-reg.cpp

 #include <iostream>
 #include <mutex>

+void ggml_virtgpu_cleanup(virtgpu *gpu);
+
 static virtgpu * apir_initialize() {
-    static virtgpu * apir_gpu_instance = NULL;
-    static bool      apir_initialized  = false;
+    static virtgpu *         gpu          = NULL;
+    static std::atomic<bool> initialized  = false;
+


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, find the file and check its includes fd 'ggml-backend-reg.cpp' --type f

Repository: crc-org/llama.cpp

Length of output: 133

🏁 Script executed:

# Check the includes at the top of the file head -20 ggml/src/ggml-virtgpu/ggml-backend-reg.cpp

Repository: crc-org/llama.cpp

Length of output: 503

🏁 Script executed:

# Search for atomic include in the file and related headers rg "#include.*atomic" ggml/src/ggml-virtgpu/

Repository: crc-org/llama.cpp

Length of output: 170

🏁 Script executed:

# Check what's in ggml-remoting.h to see if it includes atomic fd 'ggml-remoting.h' --type f

Repository: crc-org/llama.cpp

Length of output: 98

🏁 Script executed:

# Check if ggml-remoting.h includes atomic head -30 ggml/src/ggml-virtgpu/ggml-remoting.h

Repository: crc-org/llama.cpp

Length of output: 1094

🏁 Script executed:

# Check if ggml-virtgpu.h includes atomic head -30 ggml/src/ggml-virtgpu/ggml-virtgpu.h

Repository: crc-org/llama.cpp

Length of output: 156

🏁 Script executed:

# List all files in the virtgpu directory ls -la ggml/src/ggml-virtgpu/

Repository: crc-org/llama.cpp

Length of output: 1952

🏁 Script executed:

# Check if virtgpu.h includes atomic fd 'virtgpu.h' --type f -x head -30 {}

Repository: crc-org/llama.cpp

Length of output: 983

Add explicit #include <atomic> for std::atomic usage.

Line 11 uses std::atomic<bool>, but <atomic> is not explicitly included in this file. While the header is pulled transitively through virtgpu.h → virtgpu-utils.h, relying on transitive includes is fragile. Explicitly include <atomic> to prevent future breakage if the transitive dependency changes.

Suggested fix

`#include` "ggml-remoting.h" `#include` "ggml-virtgpu.h" `#include` <iostream> `#include` <mutex> +#include <atomic>

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

#include <iostream>

#include <mutex>

void ggml_virtgpu_cleanup(virtgpu *gpu);

static virtgpu * apir_initialize() {

static virtgpu * apir_gpu_instance = NULL;

static bool apir_initialized = false;

static virtgpu * gpu = NULL;

static std::atomic<bool> initialized = false;

`#include` <iostream>

`#include` <mutex>

`#include` <atomic>

void ggml_virtgpu_cleanup(virtgpu *gpu);

static virtgpu * apir_initialize() {

static virtgpu * gpu = NULL;

static std::atomic<bool> initialized = false;

🤖 Prompt for AI Agents

In `@ggml/src/ggml-virtgpu/ggml-backend-reg.cpp` around lines 4 - 12, The file uses std::atomic<bool> in the apir_initialize function but doesn't include <atomic>; add an explicit `#include` <atomic> at the top of ggml-backend-reg.cpp so the use of std::atomic in apir_initialize is valid without relying on transitive includes, keeping the include near the other standard headers (e.g., alongside <mutex> and <iostream>).

ggml/src/ggml-virtgpu/ggmlremoting_functions.yaml

tools/server/server-context.cpp

tools/server/server-task.cpp

topsail-bot · 2026-01-29T16:30:27Z

🟢 Test of 'mac_ai test test_ci' succeeded after 00 hours 25 minutes 59 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: topsail
PR_POSITIONAL_ARG_0: topsail
PR_POSITIONAL_ARG_1: mac-m4
PR_POSITIONAL_ARG_2: remoting_mac

…165) * cuda : fix nkvo * cont : more robust cuda graph node property matching * cont : restore pre-leafs implementation * cont : comments + static_assert

* hexagon: updates to enable offloading to HTP on WoS * Update windows.md * Update windows.md * hexagon: enable -O3 optimizations * hexagon: move all _WINDOWS conditional compilation to _WIN32 * hexagon: updates to enable offloading to HTP on WoS * hexagon: use run-time vs load-time dynamic linking for cdsp driver interface * refactor htp-drv * hexagon: add run-bench.ps1 script * hexagon: htdrv refactor * hexagon: unify Android and Windows build readmes * hexagon: update README.md * hexagon: refactor htpdrv * hexagon: drv refactor * hexagon: more drv refactor * hexagon: fixes for android builds * hexagon: factor out dl into ggml-backend-dl * hexagon: add run-tool.ps1 script * hexagon: merge htp-utils in htp-drv and remove unused code * wos: no need for getopt_custom.h * wos: add missing CR in htpdrv * hexagon: ndev enforecement applies only to the Android devices * hexagon: add support for generating and signing .cat file * hexagon: add .inf file * hexagon: working auto-signing and improved windows builds * hexagon: futher improve skel build * hexagon: add rough WoS guide * hexagon: updated windows guide * hexagon: improve cmake handling of certs and logging * hexagon: improve windows setup/build doc * hexagon: more windows readme updates * hexagon: windows readme updates * hexagon: windows readme updates * hexagon: windows readme updates * hexagon: windows readme updates * Update windows.md * Update windows.md * snapdragon: rename docs/backend/hexagon to docs/backends/snapdragon Also added a power shell script to simplify build env setup. * hexagon: remove trailing whitespace and move cmake requirement to user-presets * hexagon: fix CMakeUserPresets path in workflow yaml * hexagon: introduce local version of libdl.h * hexagon: fix src1 reuse logic gpt-oss needs a bigger lookahead window. The check for src[1] itself being quantized was wrong. --------- Co-authored-by: Max Krasnyansky <[email protected]>

kpouget · 2026-01-29T21:30:02Z

/test topsail kpouget-atuona remoting_linux
/var cluster.name: atuona

…g (#19151) * webgpu : pipeline flash_attn Q/K loads in WGSL * ggml-webgpu: unroll Q*K accumlation inner loop * ggml-webgpu: vectorization * ggml-webgpu: unrolling * ggml-webgpu: remove redundant unrolling * ggml-webgpu: restore the config * ggml-webgpu: remove redundant comments * ggml-webgpu: formatting * ggml-webgpu: formatting and remove vectorization * ggml-webgpu: remove unnecessary constants * ggml-webgpu: change QKV buffer to read_write to pass validation * ggml-webgpu: add explanation for the additional bracket around Q K accumulate * Indentation and for -> if for tail * Kick off CI on wgsl only commits --------- Co-authored-by: Reese Levine <[email protected]>

topsail-bot · 2026-01-30T00:16:06Z

🔴 Test of 'mac_ai test test_ci' failed after 02 hours 14 minutes 05 seconds. 🔴

• Link to the test results.

• No reports index generated...

Test configuration:

PR_POSITIONAL_ARGS: topsail
PR_POSITIONAL_ARG_0: topsail
PR_POSITIONAL_ARG_1: kpouget-atuona
PR_POSITIONAL_ARG_2: remoting_linux

Failure indicator: Empty. (See run.log)

* Fix typos in SYCL documentation * Update SYCL.md * Update SYCL.md * Update SYCL.md * Update docs/backend/SYCL.md Co-authored-by: Neo Zhang Jianyu <[email protected]> * Update SYCL.md --------- Co-authored-by: Neo Zhang Jianyu <[email protected]>

* sycl: implement GGML_OP_TRI * docs: update ops.md for SYCL TRI * docs: regenerate ops.md * docs: update SYCL support for GGML_OP_TRI

* sycl: add softplus unary op implementation * sycl: add softplus unary op implementation * docs(ops): mark SYCL SOFTPLUS as supported * docs: update SYCL status for SOFTPLUS

This commit removes the unused tmp_buf variable from llama-kv-cache.cpp and llama-memory-recurrent.cpp. The tmp_buf variable was declared but never used but since it has a non-trivial constructor/desctuctor we don't get an unused variable warning about it.

This commit adds a missing return statement to the GraniteMoeModel class to fix an issue in the model conversion process. Resolves: ggml-org/llama.cpp#19201

…function

not necessary

The static init isn't thread safe.

… memory

…sing

kpouget · 2026-01-30T10:52:48Z

/test topsail mac-m4 remoting_mac
/var cluster.name: mac

coderabbitai

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

ggml/src/ggml-virtgpu/ggml-backend-reg.cpp (1)
161-165: ⚠️ Potential issue | 🟡 Minor

Non-atomic initialized flag may cause data race.

The static bool initialized at line 161 is accessed without synchronization. If ggml_backend_virtgpu_reg() is called concurrently from multiple threads before initialization completes, this could result in a data race.

Consider using std::atomic<bool> or ensuring this function is only called from a single thread during startup.
🔒 Proposed fix
     static ggml_backend_reg reg = {
         /* .api_version = */ GGML_BACKEND_API_VERSION,
         /* .iface       = */ ggml_backend_remoting_reg_i,
         /* .context     = */ gpu,
     };

-    static bool initialized = false;
+    static std::atomic<bool> initialized{false};
     if (initialized) {
         return &reg;
     }
-    initialized = true;
+    initialized.store(true, std::memory_order_relaxed);

🤖 Fix all issues with AI agents

In `@docs/backend/snapdragon/README.md`:
- Around line 116-119: In the "Windows" section of README.md (the heading "###
Windows"), fix the typo by changing the word "scrits" to "scripts" in the
sentence referencing the Powershell folder so it reads "To run, adapt below
instructions to use Powershell scripts in `scripts/snapdragon/windows`."
- Around line 19-22: Update the README text under the "Windows On Snapdragon"
heading to correct the Visual Studio version typo: replace "MS Visual Studio
2026 (Community Edition or Pro)" with the correct current version(s), e.g., "MS
Visual Studio 2022 (Community Edition or Pro)" or "MS Visual Studio 2022/2024
(Community Edition or Pro)"; ensure only the display string is changed in the
small bullet list so the docs accurately reflect supported Visual Studio
releases.

In `@docs/backend/snapdragon/windows.md`:
- Line 147: Replace the misspelled word "certicate" with "certificate" in the
sentence that reads "The .cat file, the signature and proper certicate
installation can be verified with" in docs/backend/snapdragon/windows.md so it
reads "...proper certificate installation..."; ensure only the single typo is
corrected and surrounding punctuation/spacing is preserved.

In `@ggml/src/ggml-hexagon/htp-drv.cpp`:
- Around line 222-302: In get_driver_path(), check the return value of
LocalAlloc before using it: after LPQUERY_SERVICE_CONFIGW serviceConfig =
static_cast<...>(LocalAlloc(...)); verify serviceConfig != NULL, and if NULL
call GGML_LOG_ERROR with GetLastError (or fprintf),
CloseServiceHandle(schService) and CloseServiceHandle(schSCManager), then return
result to avoid passing NULL into QueryServiceConfigW; keep the existing cleanup
pattern used elsewhere in this function.
- Around line 63-66: The typedef remote_handle64_close_pfn_t currently declares
its parameter as remote_handle but the rest of the API and the corresponding
wrapper (the close wrapper that calls through remote_handle64_close_pfn_t) use
remote_handle64; update remote_handle64_close_pfn_t to take remote_handle64 as
its parameter type to match
remote_handle64_open_pfn_t/remote_handle64_invoke_pfn_t and the wrapper
function, ensuring ABI/size compatibility and preventing mismatched-call issues.
- Around line 103-109: In rpcmem_alloc2, detect when falling back to
rpcmem_alloc_pfn and validate that `size` fits into an `int` (compare against
INT_MAX); if it exceeds INT_MAX, log an error via GGML_LOG_ERROR (or
GGML_LOG_INFO if consistent) and return nullptr instead of truncating, otherwise
safely cast `size` to int and call `rpcmem_alloc_pfn(heapid, flags, (int)size)`.
Ensure you reference the existing symbols `rpcmem_alloc2`, `rpcmem_alloc2_pfn`,
and `rpcmem_alloc_pfn` and add no behavioral change when sizes are within
bounds.
- Around line 99-193: Wrap each wrapper to null-check its resolved function
pointer before calling (e.g., check rpcmem_free_pfn, rpcmem_to_fd_pfn,
fastrpc_mmap_pfn, fastrpc_munmap_pfn, dspqueue_create_pfn, dspqueue_close_pfn,
dspqueue_export_pfn, dspqueue_write_pfn, dspqueue_read_pfn,
remote_handle64_open_pfn, remote_handle64_invoke_pfn, remote_handle64_close_pfn,
remote_handle_control_pfn, remote_handle64_control_pfn,
remote_session_control_pfn); if the pfn is null log an error with GGML_LOG_ERROR
(or existing logger) and return a safe value: nullptr for pointer returns, -1
for int returns, and a failure AEEResult (e.g., AEE_EFAILED) for AEEResult
functions; for void functions simply log and return early. Ensure each wrapper
mirrors the pattern used in rpcmem_alloc2() for consistent defensive behavior.

In `@ggml/src/ggml-sycl/ggml-sycl.cpp`:
- Around line 2266-2322: The supports_op check for GGML_OP_TRI must be updated
to match the runtime assertions in ggml_sycl_op_tri: add checks for
ggml_is_contiguous(dst) and ggml_are_same_shape(src0, dst) alongside the
existing ggml_is_contiguous(src0) and type checks; locate the GGML_OP_TRI branch
(the function/method performing supported-op validation) and include these two
predicates so supports_op returns false when the destination isn't contiguous or
shapes differ, which aligns it with the tri_f32_sycl/ggml_sycl_op_tri
assumptions.

In `@scripts/snapdragon/windows/run-cli.ps1`:
- Around line 27-29: The code sets GGML_SCHED_DEBUG and then appends "-v" to
$cli_opts using string interpolation which converts the array to a string;
change the append to use array concatenation ($cli_opts += '-v') so $cli_opts
remains an array. Update the block that checks $env:SCHED (the lines that set
$env:GGML_SCHED_DEBUG and modify $cli_opts) to use += for $cli_opts and keep
setting $env:GGML_SCHED_DEBUG as-is, mirroring the fix applied in run-tool.ps1.

In `@scripts/snapdragon/windows/run-tool.ps1`:
- Around line 33-35: The CLI options array $cli_opts is being converted to a
string by using "$cli_opts -v"; change the append to use PowerShell array
concatenation ($cli_opts += '-v') so $cli_opts remains an array; update the
block that sets $env:GGML_SCHED_DEBUG when $env:SCHED is present to append '-v'
to $cli_opts instead of assigning a combined string.

In `@scripts/snapdragon/windows/setup-build.ps1`:
- Around line 102-103: The comment is wrong: it says "Set HEXAGON_SDK_ROOT"
while the code is setting the HEXAGON_TOOLS_ROOT environment variable; update
the comment to accurately reflect the action (e.g., change the comment to "Set
HEXAGON_TOOLS_ROOT") so it matches the SetEnvironmentVariable call that sets
'HEXAGON_TOOLS_ROOT' in the script (reference the
[System.Environment]::SetEnvironmentVariable call).
- Around line 51-69: The try/catch around the download/extract
(Invoke-WebRequest / tar) swallows the original failure after calling
Write-Error and cleanup (Remove-Item), allowing the script to continue into
environment setup; modify the catch in setup-build.ps1 so it either re-throws
the caught exception (using "throw") after cleanup or sets a failure flag (e.g.,
$DownloadFailed) and then use that flag to skip the subsequent environment
variable setup for that SDK; reference the existing catch block, Write-Error,
Remove-Item, $TempFile and $TargetDir to implement the chosen approach so failed
downloads do not proceed to env var configuration.

🧹 Nitpick comments (18)

scripts/snapdragon/windows/run-tool.ps1 (1)
15-18: Remove unused variable $remainingArgs.

The variable $remainingArgs is assigned on line 17 but never used. This is dead code.
♻️ Proposed fix
 if ($args.Count -gt 1) {
     $cli_opts=$args[1..($args.Count - 1)]
-    $remainingArgs = $args[1..($args.Count - 1)]
 }
scripts/snapdragon/windows/run-cli.ps1 (1)

49-53: Consider documenting the hardcoded parameters.

The script uses several hardcoded values (--poll 1000, -t 6, --cpu-mask 0xfc, --ctx-size 8192, etc.) that may need adjustment for different hardware configurations. Consider adding a comment explaining these defaults or making them configurable via environment variables.
ggml/src/ggml-cuda/ggml-cuda.cu (1)
3025-3030: Consider deduplicating extra sources for efficiency.

The same source tensor can be added to srcs_extra multiple times if it's used by multiple nodes, since seen_node only tracks nodes, not sources. While this doesn't cause correctness issues (consistent ordering between runs), it results in redundant property tracking.
♻️ Optional fix to deduplicate sources
     std::unordered_set<ggml_tensor *> seen_node;
     std::vector<ggml_tensor *> srcs_extra;
+    std::unordered_set<ggml_tensor *> seen_extra;
     for (int i = 0; i < cgraph->n_nodes; i++) {
         bool props_match = true;

         seen_node.insert(cgraph->nodes[i]);

         if (!res) {
             props_match = ggml_cuda_graph_node_properties_match(cgraph->nodes[i], &graph->props[i]);
         }
         if (!props_match) {
             res = true;
         }
         ggml_cuda_graph_node_set_properties(&graph->props[i], cgraph->nodes[i]);

         for (int src_idx = 0; src_idx < GGML_MAX_SRC; ++src_idx) {
             ggml_tensor * src = cgraph->nodes[i]->src[src_idx];
-            if (src && seen_node.find(src) == seen_node.end()) {
+            if (src && seen_node.find(src) == seen_node.end() && seen_extra.find(src) == seen_extra.end()) {
                 srcs_extra.push_back(src);
+                seen_extra.insert(src);
             }
         }
     }
ggml/src/ggml-webgpu/wgsl-shaders/flash_attn.wgsl (1)
163-169: Unused helper functions.

load_f32x4 and load_kvx4 are defined but never called in this shader. If these were intended for a different change, consider removing them to avoid dead code.
🗑️ Remove unused functions
-fn load_f32x4(buf: ptr<storage, array<vec4<f32>>, read_write>, scalar_index: u32) -> vec4<f32> {
-    return (*buf)[scalar_index >> 2u];
-}
-
-fn load_kvx4(buf: ptr<storage, array<vec4<KV_TYPE>>, read_write>, scalar_index: u32) -> vec4<KV_TYPE> {
-    return (*buf)[scalar_index >> 2u];
-}
ggml/src/ggml-sycl/ggml-sycl.cpp (1)
4686-4691: Align supports_op checks with ggml_sycl_op_tri assertions.
This avoids SYCL offload when dst is non-contiguous or shape-mismatched, which would otherwise assert at runtime.
🔧 Suggested guard alignment
-            {
-                const ggml_tensor * src0 = op->src[0];
-                return src0 &&
-                       op->type == GGML_TYPE_F32 &&
-                       ggml_is_contiguous(src0);
-            }
+            {
+                const ggml_tensor * src0 = op->src[0];
+                return src0 &&
+                       op->type == GGML_TYPE_F32 &&
+                       ggml_is_contiguous(src0) &&
+                       ggml_is_contiguous(op) &&
+                       ggml_are_same_shape(src0, op);
+            }
ggml/src/ggml-hexagon/libggml-htp.inf (1)
36-38: Consider using double quotes in [Strings] section.

INF files conventionally use double quotes for string values in the [Strings] section. Single quotes may cause compatibility issues with some INF parsers.
 [Strings]
-GGML   = 'GGML'
-DiskId = 'GGML HTP library'
+GGML   = "GGML"
+DiskId = "GGML HTP library"
ggml/src/ggml-hexagon/htp-drv.h (2)
64-66: Use angle brackets for system headers.

Standard system headers should use angle brackets rather than quotes for proper include path resolution.
 `#if` !defined(__hexagon__) && !defined(_WIN32)
-#    include "unistd.h"
+#    include <unistd.h>
 
 #    include <sys/time.h>
 `#endif`
47-48: Potential integer overflow in SLEEP macro.

On Windows, Sleep(1000 * x) could overflow if x is large, since 1000 * x is computed as int multiplication before being passed to Sleep(). Consider using Sleep((DWORD)(x) * 1000) or casting appropriately.
 #        ifdef _WIN32
-#            define SLEEP(x) Sleep(1000 * x) /* Sleep accepts input in milliseconds. */
+#            define SLEEP(x) Sleep((DWORD)(x) * 1000) /* Sleep accepts input in milliseconds. */
 #        else
ggml/src/ggml-virtgpu/virtgpu.cpp (1)

152-157: Unreachable code after GGML_ABORT.

GGML_ABORT typically terminates execution and does not return. The return NULL; on line 156 is unreachable. This pattern appears in other places in this file (lines 160-161, 165-166, etc.) and seems to be a defensive style, but it may confuse static analyzers or readers.

If GGML_ABORT is intended to possibly return in some configurations, consider documenting this. Otherwise, the pattern is harmless but technically dead code.

ggml/src/ggml-backend-reg.cpp (1)

14-26: Consider removing redundant platform headers.

With the dynamic loading logic now in ggml-backend-dl.cpp, the #include <windows.h> and #include <dlfcn.h> blocks (lines 14-26) may no longer be needed in this file. However, verify that get_executable_path() (lines 375-436) still requires these headers before removing them.

ggml/src/ggml-virtgpu/virtgpu-forward-buffer.cpp (1)

90-119: Consider RAII for mutex management.

The mutex lock/unlock pattern is correct, but if REMOTE_CALL or memcpy were to fail unexpectedly (e.g., via GGML_ABORT in a nested call), the mutex could remain locked. Consider using a simple RAII wrapper or C++ std::lock_guard equivalent for the C11 mutex to ensure cleanup on all exit paths.

This is a minor concern since the current code paths appear to handle errors via GGML_ABORT which terminates the process anyway.
ggml/src/ggml-backend-dl.cpp (1)
28-30: Windows dl_error() loses error information.

Returning an empty string discards useful diagnostic information. Consider using GetLastError() and FormatMessage() to provide meaningful error messages, similar to the POSIX dlerror() behavior.
💡 Suggested improvement
 const char * dl_error() {
-    return "";
+    static thread_local char buf[256];
+    DWORD err = GetLastError();
+    if (err == 0) {
+        return "";
+    }
+    FormatMessageA(FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,
+                   NULL, err, MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT),
+                   buf, sizeof(buf), NULL);
+    return buf;
+}
ggml/src/ggml-virtgpu/ggml-backend-reg.cpp (1)

98-126: Redundant initialization guard.

The static std::atomic<bool> initialized at line 98 duplicates protection that's already provided by the if (devices.size() > 0) check at line 87. Consider removing one of these guards for clarity.

Additionally, the atomic should use explicit memory ordering as noted in past reviews.
ggml/src/ggml-hexagon/ggml-hexagon.cpp (1)
3147-3148: Redundant assignment of opt_hostbuf.

This line reassigns opt_hostbuf using the same expression already evaluated on line 3128. If the intent is to ensure a default of 1 when the environment variable is not set, line 3128 already handles this correctly.
🔧 Suggested fix
-    opt_hostbuf = str_hostbuf ? atoi(str_hostbuf) : 1;
-
     reg->context = new ggml_hexagon_registry(reg);
ggml/src/ggml-backend-dl.h (1)
9-9: Potentially unnecessary <winevt.h> include.

The <winevt.h> header provides Windows Event Log APIs, which appear unrelated to dynamic library loading functionality. This may have been included by mistake.
🔧 Suggested fix
 #   include <windows.h>
-#   include <winevt.h>
 `#else`
ggml/src/ggml-hexagon/libdl.h (2)
9-9: Potentially unnecessary <winevt.h> include.

Same as in ggml-backend-dl.h, this header appears unrelated to dynamic library loading.
🔧 Suggested fix
 #   include <windows.h>
-#   include <winevt.h>
 `#else`
1-77: Consider consolidating with ggml-backend-dl.cpp.

This file provides inline implementations that are nearly identical to those in ggml/src/ggml-backend-dl.cpp (as shown in the relevant code snippets). While having a header-only version for the Hexagon backend may be intentional for build isolation, the duplication could lead to maintenance drift.

Consider whether these could share a common implementation or if the duplication is acceptable for build system reasons.
ggml/src/ggml-hexagon/htp-drv.cpp (1)
308-360: Make htpdrv_init thread-safe.

initialized is unsynchronized; concurrent callers can race and partially initialize pfns. Consider a mutex/once guard.
🛠️ One option using a mutex
-#include <string>
+#include <string>
+#include <mutex>
 int htpdrv_init() {
     static dl_handle_ptr lib_cdsp_rpc_handle = nullptr;
     static bool initialized = false;
+    static std::mutex init_mutex;
+    std::lock_guard<std::mutex> lock(init_mutex);
 `#ifdef` _WIN32

coderabbitai · 2026-01-30T11:04:59Z

docs/backend/snapdragon/README.md

+### Windows On Snapdragon
+
+Native Windows 11 arm64 builds has the following tools dependencies:
+- MS Visual Studio 2026 (Community Edition or Pro)


⚠️ Potential issue | 🟡 Minor

Fix Visual Studio version typo.

"MS Visual Studio 2026" appears to be a typo. Current versions are 2022 or 2024.

📝 Proposed fix

-- MS Visual Studio 2026 (Community Edition or Pro) +- MS Visual Studio 2022 (Community Edition or Pro)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

### Windows On Snapdragon

Native Windows 11 arm64 builds has the following tools dependencies:

- MS Visual Studio 2026 (Community Edition or Pro)

### Windows On Snapdragon

Native Windows 11 arm64 builds has the following tools dependencies:

- MS Visual Studio 2022 (Community Edition or Pro)

🤖 Prompt for AI Agents

In `@docs/backend/snapdragon/README.md` around lines 19 - 22, Update the README text under the "Windows On Snapdragon" heading to correct the Visual Studio version typo: replace "MS Visual Studio 2026 (Community Edition or Pro)" with the correct current version(s), e.g., "MS Visual Studio 2022 (Community Edition or Pro)" or "MS Visual Studio 2022/2024 (Community Edition or Pro)"; ensure only the display string is changed in the small bullet list so the docs accurately reflect supported Visual Studio releases.

coderabbitai · 2026-01-30T11:04:59Z

docs/backend/snapdragon/README.md

+### Windows
+
+All artifacts are already installed in the `pkg-snapdragon` folder.
+To run, adapt below instructions to use Powershell scrits in `scripts/snapdragon/windows`.


⚠️ Potential issue | 🟡 Minor

Fix typo: "scrits" → "scripts".

📝 Proposed fix

-To run, adapt below instructions to use Powershell scrits in `scripts/snapdragon/windows`. +To run, adapt below instructions to use PowerShell scripts in `scripts/snapdragon/windows`.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

### Windows

All artifacts are already installed in the `pkg-snapdragon` folder.

To run, adapt below instructions to use Powershell scrits in `scripts/snapdragon/windows`.

### Windows

All artifacts are already installed in the `pkg-snapdragon` folder.

To run, adapt below instructions to use PowerShell scripts in `scripts/snapdragon/windows`.

🤖 Prompt for AI Agents

In `@docs/backend/snapdragon/README.md` around lines 116 - 119, In the "Windows" section of README.md (the heading "### Windows"), fix the typo by changing the word "scrits" to "scripts" in the sentence referencing the Powershell folder so it reads "To run, adapt below instructions to use Powershell scripts in `scripts/snapdragon/windows`."

coderabbitai · 2026-01-30T11:04:59Z

docs/backend/snapdragon/windows.md

+-a----         1/22/2026   6:01 PM           4139 libggml-htp.cat
+```
+
+The .cat file, the signature and proper certicate installation can be verified with


⚠️ Potential issue | 🟡 Minor

Fix typo: "certicate" → "certificate".

📝 Proposed fix

-The .cat file, the signature and proper certicate installation can be verified with +The .cat file, the signature and proper certificate installation can be verified with

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

The .cat file, the signature and proper certicate installation can be verified with

The .cat file, the signature and proper certificate installation can be verified with

🧰 Tools

🪛 LanguageTool

[grammar] ~147-~147: Ensure spelling is correct
Context: ...The .cat file, the signature and proper certicate installation can be verified with ``` ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents

In `@docs/backend/snapdragon/windows.md` at line 147, Replace the misspelled word "certicate" with "certificate" in the sentence that reads "The .cat file, the signature and proper certicate installation can be verified with" in docs/backend/snapdragon/windows.md so it reads "...proper certificate installation..."; ensure only the single typo is corrected and surrounding punctuation/spacing is preserved.

coderabbitai · 2026-01-30T11:04:59Z

ggml/src/ggml-hexagon/htp-drv.cpp

+typedef int (*remote_handle64_open_pfn_t)(const char* name, remote_handle64 *ph);
+typedef int (*remote_handle64_invoke_pfn_t)(remote_handle64 h, uint32_t dwScalars, remote_arg *pra);
+typedef int (*remote_handle64_close_pfn_t)(remote_handle h);
+typedef int (*remote_handle_control_pfn_t)(uint32_t req, void* data, uint32_t datalen);


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, locate the file and verify its content at the specified lines fd "htp-drv.cpp" -t f

Repository: crc-org/llama.cpp

Length of output: 94

🏁 Script executed:

# Read the specific lines mentioned in the review (63-66) sed -n '60,70p' ggml/src/ggml-hexagon/htp-drv.cpp

Repository: crc-org/llama.cpp

Length of output: 803

🏁 Script executed:

# Search for remote_handle typedef definitions in headers rg -n "typedef.*remote_handle" --type h --type hpp -A 1 -B 1

Repository: crc-org/llama.cpp

Length of output: 88

🏁 Script executed:

# Also search in any included files for the definitions rg -n "typedef.*remote_handle|struct.*remote_handle" -g '*.{h,hpp}' -A 1

Repository: crc-org/llama.cpp

Length of output: 43

🏁 Script executed:

# Search for remote_handle type definitions in the entire codebase rg -n "typedef|struct|using" | rg "remote_handle64|remote_handle\b" | head -20

Repository: crc-org/llama.cpp

Length of output: 561

🏁 Script executed:

# Look for any header files that might define these types fd -t f -e h -e hpp -e hxx | head -20

Repository: crc-org/llama.cpp

Length of output: 469

🏁 Script executed:

# Search in the Hexagon-specific directory for type definitions rg -n "remote_handle" ggml/src/ggml-hexagon/ -A 2 -B 2

Repository: crc-org/llama.cpp

Length of output: 9504

🏁 Script executed:

# Check if remote_handle and remote_handle64 are documented or defined anywhere in comments/code rg -n "remote_handle64\|remote_handle" --type c --type cpp -C 3 | head -50

Repository: crc-org/llama.cpp

Length of output: 43

🏁 Script executed:

# Search for where remote_handle and remote_handle64 are defined or used in the codebase rg -n "remote_handle64.*h\)" ggml/src/ggml-hexagon/ | grep -E "(remote_handle64_close|remote_handle64_invoke|remote_handle64_open|remote_handle64_control)" | head -20

Repository: crc-org/llama.cpp

Length of output: 654

🏁 Script executed:

# Check the dlsym loading at line 354 to see what's being loaded sed -n '349,355p' ggml/src/ggml-hexagon/htp-drv.cpp

Repository: crc-org/llama.cpp

Length of output: 736

🏁 Script executed:

# Look for other typedefs to compare the pattern rg -n "typedef int.*remote_handle" ggml/src/ggml-hexagon/htp-drv.cpp

Repository: crc-org/llama.cpp

Length of output: 516

Fix typedef parameter type on line 65 to use remote_handle64.

The typedef remote_handle64_close_pfn_t on line 65 uses remote_handle as its parameter type, while the wrapper function at line 179 and all other similar typedefs (remote_handle64_open_pfn_t, remote_handle64_invoke_pfn_t, remote_handle64_control_pfn_t) consistently use remote_handle64. This creates an ABI mismatch where a remote_handle64 value is passed to a function pointer expecting a different type, potentially causing memory corruption if the types differ in size.

🛠️ Proposed fix

-typedef int (*remote_handle64_close_pfn_t)(remote_handle h); +typedef int (*remote_handle64_close_pfn_t)(remote_handle64 h);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

typedef int (*remote_handle64_open_pfn_t)(const char* name, remote_handle64 *ph);

typedef int (*remote_handle64_invoke_pfn_t)(remote_handle64 h, uint32_t dwScalars, remote_arg *pra);

typedef int (*remote_handle64_close_pfn_t)(remote_handle h);

typedef int (*remote_handle_control_pfn_t)(uint32_t req, void* data, uint32_t datalen);

typedef int (*remote_handle64_open_pfn_t)(const char* name, remote_handle64 *ph);

typedef int (*remote_handle64_invoke_pfn_t)(remote_handle64 h, uint32_t dwScalars, remote_arg *pra);

typedef int (*remote_handle64_close_pfn_t)(remote_handle64 h);

typedef int (*remote_handle_control_pfn_t)(uint32_t req, void* data, uint32_t datalen);

🤖 Prompt for AI Agents

In `@ggml/src/ggml-hexagon/htp-drv.cpp` around lines 63 - 66, The typedef remote_handle64_close_pfn_t currently declares its parameter as remote_handle but the rest of the API and the corresponding wrapper (the close wrapper that calls through remote_handle64_close_pfn_t) use remote_handle64; update remote_handle64_close_pfn_t to take remote_handle64 as its parameter type to match remote_handle64_open_pfn_t/remote_handle64_invoke_pfn_t and the wrapper function, ensuring ABI/size compatibility and preventing mismatched-call issues.

coderabbitai · 2026-01-30T11:04:59Z

ggml/src/ggml-hexagon/htp-drv.cpp

+void * rpcmem_alloc(int heapid, uint32_t flags, int size) {
+    return rpcmem_alloc_pfn(heapid, flags, size);
+}
+
+void * rpcmem_alloc2(int heapid, uint32_t flags, size_t size) {
+    if (rpcmem_alloc2_pfn) {
+        return rpcmem_alloc2_pfn(heapid, flags, size);
+    } else {
+        GGML_LOG_INFO("ggml-hex: rpcmem_alloc2 not found, falling back to rpcmem_alloc\n");
+        return rpcmem_alloc_pfn(heapid, flags, size);
+    }
+}
+
+void rpcmem_free(void * po) {
+    return rpcmem_free_pfn(po);
+}
+
+int rpcmem_to_fd(void * po) {
+    return rpcmem_to_fd_pfn(po);
+}
+
+HTPDRV_API int fastrpc_mmap(int domain, int fd, void * addr, int offset, size_t length, enum fastrpc_map_flags flags) {
+    return fastrpc_mmap_pfn(domain, fd, addr, offset, length, flags);
+}
+
+HTPDRV_API int fastrpc_munmap(int domain, int fd, void * addr, size_t length) {
+    return fastrpc_munmap_pfn(domain, fd, addr, length);
+}
+
+AEEResult dspqueue_create(int                 domain,
+                          uint32_t            flags,
+                          uint32_t            req_queue_size,
+                          uint32_t            resp_queue_size,
+                          dspqueue_callback_t packet_callback,
+                          dspqueue_callback_t error_callback,
+                          void *              callback_context,
+                          dspqueue_t *        queue) {
+    return dspqueue_create_pfn(domain, flags, req_queue_size, resp_queue_size, packet_callback, error_callback,
+                               callback_context, queue);
+}
+
+AEEResult dspqueue_close(dspqueue_t queue) {
+    return dspqueue_close_pfn(queue);
+}
+
+AEEResult dspqueue_export(dspqueue_t queue, uint64_t * queue_id) {
+    return dspqueue_export_pfn(queue, queue_id);
+}
+
+AEEResult dspqueue_write(dspqueue_t               queue,
+                         uint32_t                 flags,
+                         uint32_t                 num_buffers,
+                         struct dspqueue_buffer * buffers,
+                         uint32_t                 message_length,
+                         const uint8_t *          message,
+                         uint32_t                 timeout_us) {
+    return dspqueue_write_pfn(queue, flags, num_buffers, buffers, message_length, message, timeout_us);
+}
+
+AEEResult dspqueue_read(dspqueue_t               queue,
+                        uint32_t *               flags,
+                        uint32_t                 max_buffers,
+                        uint32_t *               num_buffers,
+                        struct dspqueue_buffer * buffers,
+                        uint32_t                 max_message_length,
+                        uint32_t *               message_length,
+                        uint8_t *                message,
+                        uint32_t                 timeout_us) {
+    return dspqueue_read_pfn(queue, flags, max_buffers, num_buffers, buffers, max_message_length, message_length,
+                             message, timeout_us);
+}
+
+HTPDRV_API int remote_handle64_open(const char * name, remote_handle64 * ph) {
+    return remote_handle64_open_pfn(name, ph);
+}
+
+HTPDRV_API int remote_handle64_invoke(remote_handle64 h, uint32_t dwScalars, remote_arg * pra) {
+    return remote_handle64_invoke_pfn(h, dwScalars, pra);
+}
+
+HTPDRV_API int remote_handle64_close(remote_handle64 h) {
+    return remote_handle64_close_pfn(h);
+}
+
+HTPDRV_API int remote_handle_control(uint32_t req, void * data, uint32_t datalen) {
+    return remote_handle_control_pfn(req, data, datalen);
+}
+
+HTPDRV_API int remote_handle64_control(remote_handle64 h, uint32_t req, void * data, uint32_t datalen) {
+    return remote_handle64_control_pfn(h, req, data, datalen);
+}
+
+HTPDRV_API int remote_session_control(uint32_t req, void * data, uint32_t datalen) {
+    return remote_session_control_pfn(req, data, datalen);
+}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the actual file content around lines 99-193 cat -n ggml/src/ggml-hexagon/htp-drv.cpp | sed -n '90,200p'

Repository: crc-org/llama.cpp

Length of output: 5125

🏁 Script executed:

# Search for htpdrv_init function definition rg -n "htpdrv_init\|rpcmem_alloc_pfn" ggml/src/ggml-hexagon/htp-drv.cpp -A 5 | head -100

Repository: crc-org/llama.cpp

Length of output: 43

🏁 Script executed:

# Check if pfn variables are initialized at module level rg -n "rpcmem_alloc_pfn\s*=|rpcmem_free_pfn\s*=|fastrpc_mmap_pfn\s*=" ggml/src/ggml-hexagon/htp-drv.cpp -B 2

Repository: crc-org/llama.cpp

Length of output: 336

🏁 Script executed:

# Look for any guards or checks before calling pfn functions rg -n "if.*_pfn\)|if.*_pfn\s*{" ggml/src/ggml-hexagon/htp-drv.cpp | head -20

Repository: crc-org/llama.cpp

Length of output: 170

🏁 Script executed:

# Search for htpdrv_init implementation rg -n "htpdrv_init" ggml/src/ggml-hexagon/htp-drv.cpp -A 20 | head -100

Repository: crc-org/llama.cpp

Length of output: 837

🏁 Script executed:

# Check for any initialization patterns or startup code rg -n "rpcmem_alloc_pfn\s*=" ggml/src/ggml-hexagon/htp-drv.cpp -B 3 -A 3

Repository: crc-org/llama.cpp

Length of output: 300

🏁 Script executed:

# Look for how this module is used - check header files fd "\.h$" ggml/src/ggml-hexagon/ | head -5

Repository: crc-org/llama.cpp

Length of output: 247

🏁 Script executed:

# Get more of htpdrv_init to see where pfns are set rg -n "htpdrv_init" ggml/src/ggml-hexagon/htp-drv.cpp -A 100 | grep -E "(rpcmem_alloc_pfn\s*=|rpcmem_free_pfn\s*=|initialized\s*=|return)" | head -30

Repository: crc-org/llama.cpp

Length of output: 566

🏁 Script executed:

# Check if there are any guards at call sites of these wrapper functions rg -n "rpcmem_alloc$|rpcmem_free\(|fastrpc_mmap\(" ggml/src/ggml-hexagon/ -B 2 | head -50

Repository: crc-org/llama.cpp

Length of output: 1435

🏁 Script executed:

# Check header file to understand API contracts fd "htp-drv\.h" ggml/src/ggml-hexagon/ && cat ggml/src/ggml-hexagon/htp-drv.h | head -100

Repository: crc-org/llama.cpp

Length of output: 2545

🏁 Script executed:

# Check how htpdrv_init is called to understand the initialization order rg -n "htpdrv_init\($" ggml/src/ggml-hexagon/ -B 3 -A 3

Repository: crc-org/llama.cpp

Length of output: 1076

🏁 Script executed:

# Look for the complete htpdrv_init implementation to see all pfn assignments sed -n '308,400p' ggml/src/ggml-hexagon/htp-drv.cpp

Repository: crc-org/llama.cpp

Length of output: 4259

🏁 Script executed:

# Check if there's any documentation or contracts in comments about initialization order rg -n "htpdrv_init\|must be called\|before calling" ggml/src/ggml-hexagon/htp-drv.h -B 2 -A 5

Repository: crc-org/llama.cpp

Length of output: 43

🏁 Script executed:

# Let me verify the complete dlsym setup and check if there's any defensive initialization pattern used elsewhere sed -n '330,360p' ggml/src/ggml-hexagon/htp-drv.cpp

Repository: crc-org/llama.cpp

Length of output: 2287

🏁 Script executed:

# Check if there are any comments in the header about initialization requirements sed -n '95,135p' ggml/src/ggml-hexagon/htp-drv.h

Repository: crc-org/llama.cpp

Length of output: 748

🏁 Script executed:

# Verify the coding style of the wrapper functions (indentation, pointer alignment, etc.) sed -n '99,115p' ggml/src/ggml-hexagon/htp-drv.cpp | cat -A

Repository: crc-org/llama.cpp

Length of output: 571

🏁 Script executed:

# Check if there are any other callers of wrapper functions that might bypass htpdrv_init rg -n "rpcmem_alloc\(|rpcmem_free\(|fastrpc_mmap\(" ggml/src/ggml-hexagon/ggml-hexagon.cpp -B 5 | head -40

Repository: crc-org/llama.cpp

Length of output: 928

🏁 Script executed:

# Verify if there are any static initializers or global initialization that might use these before htpdrv_init rg -n "rpcmem_alloc\(|rpcmem_free\(|fastrpc_mmap\(" ggml/src/ggml-hexagon/ --type cpp | grep -v "ggml-hexagon.cpp" | head -10

Repository: crc-org/llama.cpp

Length of output: 383

Add null-pointer guards to wrapper functions.

These wrapper functions at lines 99-193 call function pointers unconditionally without checking if they've been resolved. While htpdrv_init() is the intended initialization entry point, the wrappers will crash if called before initialization succeeds or symbol resolution fails. The codebase already demonstrates this pattern—rpcmem_alloc2() (line 104) and get_hex_arch_ver() (line 380) include defensive null-checks.

Add guards similar to rpcmem_alloc2(): check the pfn pointer before calling, log an error, and return a safe value (nullptr or error code depending on return type).

Example guard

void * rpcmem_alloc(int heapid, uint32_t flags, int size) { - return rpcmem_alloc_pfn(heapid, flags, size); + if (!rpcmem_alloc_pfn) { + GGML_LOG_ERROR("ggml-hex: rpcmem_alloc called before htpdrv_init\n"); + return nullptr; + } + return rpcmem_alloc_pfn(heapid, flags, size); }

Apply similarly to rpcmem_free(), rpcmem_to_fd(), fastrpc_mmap(), fastrpc_munmap(), dspqueue_*(), and remote_*() functions.

🤖 Prompt for AI Agents

In `@ggml/src/ggml-hexagon/htp-drv.cpp` around lines 99 - 193, Wrap each wrapper to null-check its resolved function pointer before calling (e.g., check rpcmem_free_pfn, rpcmem_to_fd_pfn, fastrpc_mmap_pfn, fastrpc_munmap_pfn, dspqueue_create_pfn, dspqueue_close_pfn, dspqueue_export_pfn, dspqueue_write_pfn, dspqueue_read_pfn, remote_handle64_open_pfn, remote_handle64_invoke_pfn, remote_handle64_close_pfn, remote_handle_control_pfn, remote_handle64_control_pfn, remote_session_control_pfn); if the pfn is null log an error with GGML_LOG_ERROR (or existing logger) and return a safe value: nullptr for pointer returns, -1 for int returns, and a failure AEEResult (e.g., AEE_EFAILED) for AEEResult functions; for void functions simply log and return early. Ensure each wrapper mirrors the pattern used in rpcmem_alloc2() for consistent defensive behavior.

coderabbitai · 2026-01-30T11:04:59Z

ggml/src/ggml-sycl/ggml-sycl.cpp

+static void tri_f32_sycl(
+    const float * src,
+    float * dst,
+    const int64_t ne0,
+    const int64_t ne1,
+    const int64_t ne2,
+    const int64_t ne3,
+    const ggml_tri_type ttype,
+    dpct::queue_ptr main_stream
+) {
+    const size_t total = (size_t) ne0 * (size_t) ne1 * (size_t) ne2 * (size_t) ne3;
+
+    main_stream->parallel_for(sycl::range<1>(total), [=](sycl::id<1> tid) {
+        const int64_t idx = (int64_t) tid[0];
+
+        const int64_t i0 = idx % ne0;
+        const int64_t t1 = idx / ne0;
+        const int64_t i1 = t1 % ne1;
+
+        bool keep = false;
+        switch (ttype) {
+            case GGML_TRI_TYPE_LOWER:      keep = (i0 <  i1); break;
+            case GGML_TRI_TYPE_LOWER_DIAG: keep = (i0 <= i1); break;
+            case GGML_TRI_TYPE_UPPER:      keep = (i0 >  i1); break;
+            case GGML_TRI_TYPE_UPPER_DIAG: keep = (i0 >= i1); break;
+            default: keep = false; break;
+        }
+
+        dst[idx] = keep ? src[idx] : 0.0f;
+    });
+}
+
+static void ggml_sycl_op_tri(ggml_backend_sycl_context & ctx, ggml_tensor * dst) {
+    const ggml_tensor * src0 = dst->src[0];
+    GGML_ASSERT(src0);
+
+    GGML_ASSERT(src0->type == GGML_TYPE_F32);
+    GGML_ASSERT(dst->type  == GGML_TYPE_F32);
+    GGML_ASSERT(ggml_is_contiguous(src0));
+    GGML_ASSERT(ggml_is_contiguous(dst));
+    GGML_ASSERT(ggml_are_same_shape(src0, dst));
+
+    dpct::queue_ptr main_stream = ctx.stream();
+    SYCL_CHECK(ggml_sycl_set_device(ctx.device));
+
+    const float * src0_dd = static_cast<const float *>(src0->data);
+    float *       dst_dd  = static_cast<float *>(dst->data);
+
+    const ggml_tri_type ttype = (ggml_tri_type) ggml_get_op_params_i32(dst, 0);
+
+    const int64_t ne0 = src0->ne[0];
+    const int64_t ne1 = src0->ne[1];
+    const int64_t ne2 = src0->ne[2];
+    const int64_t ne3 = src0->ne[3];
+
+    tri_f32_sycl(src0_dd, dst_dd, ne0, ne1, ne2, ne3, ttype, main_stream);
+}


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Search for TRI implementations in CPU, CUDA and other backends rg -A 15 "case GGML_OP_TRI:" ggml/src --type cpp --type c -B 2

Repository: crc-org/llama.cpp

Length of output: 13585

🏁 Script executed:

# Look for tri_f32 implementations in different files fd . ggml/src -type f $ -name "*.cpp" -o -name "*.c" $ -exec grep -l "tri_f32" {} \;

Repository: crc-org/llama.cpp

Length of output: 231

🏁 Script executed:

# Find CPU TRI implementation rg -B 5 -A 20 "void.*tri.*f32.*const float" ggml/src/ggml.c --type c

Repository: crc-org/llama.cpp

Length of output: 43

🏁 Script executed:

# Find CPU TRI implementation rg -B 2 -A 25 "ggml_compute_forward_tri" ggml/src/ggml-cpu/ggml-cpu.c

Repository: crc-org/llama.cpp

Length of output: 1029

🏁 Script executed:

# Also search in main ggml.c file for tri definitions rg -B 5 -A 25 "compute_forward_tri" ggml/src/ggml.c

Repository: crc-org/llama.cpp

Length of output: 43

🏁 Script executed:

# Search for the actual tri function definition rg -B 2 -A 30 "static inline void ggml_compute_forward_tri|void ggml_compute_forward_tri" ggml/src

Repository: crc-org/llama.cpp

Length of output: 7720

🏁 Script executed:

# Search for tri mask logic patterns rg -B 5 -A 15 "GGML_TRI_TYPE_LOWER|GGML_TRI_TYPE_UPPER" ggml/src --type c

Repository: crc-org/llama.cpp

Length of output: 43

🏁 Script executed:

# Verify the support check for TRI in SYCL rg -B 5 -A 8 "case GGML_OP_TRI:" ggml/src/ggml-sycl/ggml-sycl.cpp | grep -A 8 "supports_op"

Repository: crc-org/llama.cpp

Length of output: 43

🏁 Script executed:

# Check runtime assertions in tri_f32_sycl against support check sed -n '2266,2322p' ggml/src/ggml-sycl/ggml-sycl.cpp

Repository: crc-org/llama.cpp

Length of output: 1956

🏁 Script executed:

# View support check implementation sed -n '4686,4691p' ggml/src/ggml-sycl/ggml-sycl.cpp

Repository: crc-org/llama.cpp

Length of output: 275

TRI mask orientation is correct and matches CPU backend.

The keep conditions properly implement the triangular mask logic with i0 as the column index and i1 as the row index. The LOWER/UPPER distinctions match the CPU implementation exactly.

However, align the supports_op check for GGML_OP_TRI with the runtime assertions in ggml_sycl_op_tri. Add checks for:

ggml_is_contiguous(dst)

ggml_are_same_shape(src0, dst)

The current support check at lines 4686-4691 only validates source contiguity and type, but the operation asserts that destination is also contiguous and shapes match.

🤖 Prompt for AI Agents

In `@ggml/src/ggml-sycl/ggml-sycl.cpp` around lines 2266 - 2322, The supports_op check for GGML_OP_TRI must be updated to match the runtime assertions in ggml_sycl_op_tri: add checks for ggml_is_contiguous(dst) and ggml_are_same_shape(src0, dst) alongside the existing ggml_is_contiguous(src0) and type checks; locate the GGML_OP_TRI branch (the function/method performing supported-op validation) and include these two predicates so supports_op returns false when the destination isn't contiguous or shapes differ, which aligns it with the tri_f32_sycl/ggml_sycl_op_tri assumptions.

coderabbitai · 2026-01-30T11:04:59Z

scripts/snapdragon/windows/run-cli.ps1

+if ($null -ne $env:SCHED) {
+    $env:GGML_SCHED_DEBUG=$env:SCHED; $cli_opts="$cli_opts -v"
+}


⚠️ Potential issue | 🟡 Minor

Fix array concatenation for CLI options (same issue as run-tool.ps1).

Using $cli_opts="$cli_opts -v" converts the array to a string. Use += to append to the array properly.

🐛 Proposed fix

if ($null -ne $env:SCHED) { - $env:GGML_SCHED_DEBUG=$env:SCHED; $cli_opts="$cli_opts -v" + $env:GGML_SCHED_DEBUG=$env:SCHED; $cli_opts += "-v" }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if ($null -ne $env:SCHED) {

$env:GGML_SCHED_DEBUG=$env:SCHED; $cli_opts="$cli_opts -v"

}

if ($null -ne $env:SCHED) {

$env:GGML_SCHED_DEBUG=$env:SCHED; $cli_opts += "-v"

}

🤖 Prompt for AI Agents

In `@scripts/snapdragon/windows/run-cli.ps1` around lines 27 - 29, The code sets GGML_SCHED_DEBUG and then appends "-v" to $cli_opts using string interpolation which converts the array to a string; change the append to use array concatenation ($cli_opts += '-v') so $cli_opts remains an array. Update the block that checks $env:SCHED (the lines that set $env:GGML_SCHED_DEBUG and modify $cli_opts) to use += for $cli_opts and keep setting $env:GGML_SCHED_DEBUG as-is, mirroring the fix applied in run-tool.ps1.

coderabbitai · 2026-01-30T11:04:59Z

scripts/snapdragon/windows/run-tool.ps1

+if ($null -ne $env:SCHED) {
+    $env:GGML_SCHED_DEBUG=$env:SCHED; $cli_opts="$cli_opts -v"
+}


⚠️ Potential issue | 🟡 Minor

Fix array concatenation for CLI options.

Using $cli_opts="$cli_opts -v" converts the array to a string. Use += to append to the array properly.

🐛 Proposed fix

if ($null -ne $env:SCHED) { - $env:GGML_SCHED_DEBUG=$env:SCHED; $cli_opts="$cli_opts -v" + $env:GGML_SCHED_DEBUG=$env:SCHED; $cli_opts += "-v" }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if ($null -ne $env:SCHED) {

$env:GGML_SCHED_DEBUG=$env:SCHED; $cli_opts="$cli_opts -v"

}

if ($null -ne $env:SCHED) {

$env:GGML_SCHED_DEBUG=$env:SCHED; $cli_opts += "-v"

}

🤖 Prompt for AI Agents

In `@scripts/snapdragon/windows/run-tool.ps1` around lines 33 - 35, The CLI options array $cli_opts is being converted to a string by using "$cli_opts -v"; change the append to use PowerShell array concatenation ($cli_opts += '-v') so $cli_opts remains an array; update the block that sets $env:GGML_SCHED_DEBUG when $env:SCHED is present to append '-v' to $cli_opts instead of assigning a combined string.

coderabbitai · 2026-01-30T11:04:59Z

scripts/snapdragon/windows/setup-build.ps1

+        try {
+            # Download
+            Write-Host "Downloading from: $Url"
+            Invoke-WebRequest -Uri $Url -OutFile $TempFile
+
+            # Untar
+            # Note: We assume Windows includes tar.exe (Win 10 build 17063+)
+            Write-Host "Extracting archive to $TargetDir..."
+
+            # We use -C to extract contents INTO the target directory created above
+            tar -xJvf $TempFile -C $TargetDir\..
+
+            Write-Host "Extraction complete." -ForegroundColor Green
+        }
+        catch {
+            Write-Error "Failed to download or extract $Name. Error: $_"
+            # Cleanup target dir if failed so script tries again next time
+            Remove-Item -Path $TargetDir -Recurse -Force -ErrorAction SilentlyContinue
+        }


⚠️ Potential issue | 🟡 Minor

Error handling allows script to continue after failure.

When Invoke-WebRequest or tar fails, Write-Error is called but the script continues to the environment variable setup section. With $ErrorActionPreference = "Stop", Write-Error actually throws a terminating error, but this is caught by the catch block and doesn't re-throw, so execution proceeds.

Consider re-throwing the exception or tracking failure state to skip environment variable setup for failed SDKs.

🛡️ Proposed fix to re-throw after cleanup

catch { Write-Error "Failed to download or extract $Name. Error: $_" # Cleanup target dir if failed so script tries again next time Remove-Item -Path $TargetDir -Recurse -Force -ErrorAction SilentlyContinue + throw }

🤖 Prompt for AI Agents

In `@scripts/snapdragon/windows/setup-build.ps1` around lines 51 - 69, The try/catch around the download/extract (Invoke-WebRequest / tar) swallows the original failure after calling Write-Error and cleanup (Remove-Item), allowing the script to continue into environment setup; modify the catch in setup-build.ps1 so it either re-throws the caught exception (using "throw") after cleanup or sets a failure flag (e.g., $DownloadFailed) and then use that flag to skip the subsequent environment variable setup for that SDK; reference the existing catch block, Write-Error, Remove-Item, $TempFile and $TargetDir to implement the chosen approach so failed downloads do not proceed to env var configuration.

coderabbitai · 2026-01-30T11:05:00Z

scripts/snapdragon/windows/setup-build.ps1

+# Set HEXAGON_SDK_ROOT
+[System.Environment]::SetEnvironmentVariable('HEXAGON_TOOLS_ROOT', $HexagonToolsTarget, [System.EnvironmentVariableTarget]::User)


⚠️ Potential issue | 🟡 Minor

Fix misleading comment.

The comment says "Set HEXAGON_SDK_ROOT" but the code sets HEXAGON_TOOLS_ROOT.

📝 Proposed fix

-# Set HEXAGON_SDK_ROOT +# Set HEXAGON_TOOLS_ROOT [System.Environment]::SetEnvironmentVariable('HEXAGON_TOOLS_ROOT', $HexagonToolsTarget, [System.EnvironmentVariableTarget]::User)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Set HEXAGON_SDK_ROOT

[System.Environment]::SetEnvironmentVariable('HEXAGON_TOOLS_ROOT', $HexagonToolsTarget, [System.EnvironmentVariableTarget]::User)

# Set HEXAGON_TOOLS_ROOT

[System.Environment]::SetEnvironmentVariable('HEXAGON_TOOLS_ROOT', $HexagonToolsTarget, [System.EnvironmentVariableTarget]::User)

🤖 Prompt for AI Agents

In `@scripts/snapdragon/windows/setup-build.ps1` around lines 102 - 103, The comment is wrong: it says "Set HEXAGON_SDK_ROOT" while the code is setting the HEXAGON_TOOLS_ROOT environment variable; update the comment to accurately reflect the action (e.g., change the comment to "Set HEXAGON_TOOLS_ROOT") so it matches the SetEnvironmentVariable call that sets 'HEXAGON_TOOLS_ROOT' in the script (reference the [System.Environment]::SetEnvironmentVariable call).

topsail-bot · 2026-01-30T11:53:26Z

🟢 Test of 'mac_ai test test_ci' succeeded after 00 hours 25 minutes 52 seconds. 🟢

• Link to the test results.

• Link to the reports index.

Test configuration:

PR_POSITIONAL_ARGS: topsail
PR_POSITIONAL_ARG_0: topsail
PR_POSITIONAL_ARG_1: mac-m4
PR_POSITIONAL_ARG_2: remoting_mac

PatKamin and others added 2 commits January 28, 2026 23:33

ggml-sycl: remove unused syclcompat header (#19140)

0cd7032

The syclcompat/math.hpp is not used anymore. The change that intrduced it was successfuly reverted (ggml-org/llama.cpp#17826). This include path will become obsolete and dropped in oneAPI 2026.0 effectively breaking ggml-sycl builds.

kpouget force-pushed the leaks branch from f4a712f to 81ff872 Compare January 28, 2026 20:34

coderabbitai bot reviewed Jan 28, 2026

View reviewed changes

ci : find latest release with asset for winget (#19161)

50e8962

arthw and others added 4 commits January 29, 2026 09:20

sycl: fix norm kernels: l2_norm, group_norm, rms_norm by remove asser…

d4964a7

…t to support more cases (#19154) Co-authored-by: Neo Zhang Jianyu <[email protected]>

CUDA: refactor topk-moe to enable more models (GLM 4.7, Nemotron etc.…

3bcc990

…) (#19126)

ggml-zendnn : resolve ZenDNN backend cross-module symbol dependency (…

b33df26

…#19159)

arg : add -kvu to llama-batched-bench (#19172)

eed25bc

kpouget force-pushed the leaks branch from 81ff872 to fb9c923 Compare January 29, 2026 08:58

coderabbitai bot reviewed Jan 29, 2026

View reviewed changes

zhang-hui-yulo and others added 2 commits January 29, 2026 11:10

jinja : do not pass empty tools and add some none filters (#19176)

b45ef27

chat : add parsing for solar-open-100b (#18540)

7b7ae85

* chat : add parsing for solar-open-100b * add comments to rules * cont : make assistant start optional * cont : remove assistant start prefix altogether --------- Co-authored-by: Piotr Wilkin (ilintar) <[email protected]>

kpouget force-pushed the leaks branch from fb9c923 to 5ea12b9 Compare January 29, 2026 15:29

coderabbitai bot reviewed Jan 29, 2026

View reviewed changes

ggerganov and others added 2 commits January 29, 2026 18:45

cuda : fix nkvo, offload and cuda graph node properties matching (#19…

4fdbc1e

…165) * cuda : fix nkvo * cont : more robust cuda graph node property matching * cont : restore pre-leafs implementation * cont : comments + static_assert

DDXDB and others added 15 commits January 30, 2026 09:46

sycl: implement GGML_OP_TRI (#19089)

c7358dd

* sycl: implement GGML_OP_TRI * docs: update ops.md for SYCL TRI * docs: regenerate ops.md * docs: update SYCL support for GGML_OP_TRI

sycl: implement GGML_UNARY_OP_SOFTPLUS (#19114)

1025fd2

* sycl: add softplus unary op implementation * sycl: add softplus unary op implementation * docs(ops): mark SYCL SOFTPLUS as supported * docs: update SYCL status for SOFTPLUS

add tensor type checking as part of cuda graph properties (#19186)

ecbf01d

docs: Add LlamaLib to UI projects (#19181)

b316895

convert : add missing return statement for GraniteMoeModel (#19202)

0562503

This commit adds a missing return statement to the GraniteMoeModel class to fix an issue in the model conversion process. Resolves: ggml-org/llama.cpp#19201

ggml-virtgpu: regenerate_remoting.py: add the ability to deprecate a …

e9d130a

…function

ggml-virtgpu: deprecate buffer_type is_host remoting

c6a6085

not necessary

ggml-virtgpu: stop using static vars as cache

92390ad

The static init isn't thread safe.

ggml-virtgpu: protect the use of the shared memory to transfer data

fcc6890

ggml-virtgpu: make the remote calls thread-safe

171ab8b

ggml-virtgpu: backend: don't continue if couldn't allocate the tensor…

3864be3

… memory

ggml-virtgpu: add a cleanup function for consistency

07f41ca

ggml-virtgpu: backend: don't crash if buft->iface.get_max_size is mis…

f35b24e

…sing

kpouget force-pushed the leaks branch from 5ea12b9 to f35b24e Compare January 30, 2026 10:48

coderabbitai bot reviewed Jan 30, 2026

View reviewed changes

	The .cat file, the signature and proper certicate installation can be verified with
	The .cat file, the signature and proper certificate installation can be verified with

		# Set HEXAGON_SDK_ROOT
		[System.Environment]::SetEnvironmentVariable('HEXAGON_TOOLS_ROOT', $HexagonToolsTarget, [System.EnvironmentVariableTarget]::User)

Fix leaks #26

Are you sure you want to change the base?

Fix leaks #26

Uh oh!

Conversation

kpouget commented Jan 28, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci bot commented Jan 28, 2026

Uh oh!

kpouget commented Jan 28, 2026

Uh oh!

topsail-bot bot commented Jan 28, 2026

Uh oh!

kpouget commented Jan 28, 2026

Uh oh!

topsail-bot bot commented Jan 28, 2026

Uh oh!

kpouget commented Jan 28, 2026

Uh oh!

topsail-bot bot commented Jan 28, 2026

Uh oh!

coderabbitai bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

kpouget commented Jan 28, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

topsail-bot bot commented Jan 28, 2026

Uh oh!

kpouget commented Jan 29, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

topsail-bot bot commented Jan 29, 2026

Uh oh!

kpouget commented Jan 29, 2026

Uh oh!

topsail-bot bot commented Jan 29, 2026

Uh oh!

topsail-bot bot commented Jan 29, 2026

Uh oh!

kpouget commented Jan 29, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

topsail-bot bot commented Jan 29, 2026

kpouget commented Jan 28, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 28, 2026 •

edited

Loading

kpouget commented Jan 30, 2026 •

edited

Loading