Skip to content

Conversation

@byko3y
Copy link

@byko3y byko3y commented Dec 25, 2025

Split command line parameters and runtime adapter info into different struct-s.
Bump max graph size according to LoRa count and tensor size.
Fixes bug #18050

Some rationale on the changes. LoRa adapters can be as complex as the original model by hooking into every stage i.e. Wk, Wq, Wv in attention and Wup, Wgate, Wdown in FFN.
With multiple LoRa-s the graph size can become more than double of the original graph, thus leading to the 18050 crash; this means that you cannot safely add arbitrary LoRa-s unless you specifically initialized llama_context via common_init_from_params for these specific adapters.
I could have just passed the tensor count into llama_context constructor without much other noise, but that would make the mess even harder to untangle latter. And there are more issues to it, for example, with LoRa-s enabled I get:

ggml_gallocr_needs_realloc: graph has different number of nodes
ggml_gallocr_alloc_graph: reallocating buffers automatically

That's why the build graph with all lora-s active? question left in llama_context::graph_reserve.

So in the end I moved LoRa adapters initialization into common_init_from_params and explicitly bound the adapters lifetime to the common_init_result — original code implicitly stored adapters in common_init_result but most handling was done via raw pointers stored in common_params instead which was just a timebomb waiting to explode.
Particulary, server_context makes unrestricted deep copies/assignments of common_params, so it was all messed up anyway (and that's also why ngxson left todo comments in the code).

Another small thing: adapter scale -1.0f is a valid scale i.e. inverted adapter

llama-server --lora-scaled lora1.gguf:-1.0

, but server_context treated negative scales as "adapter disabled" which is incorrect.

Split command line parameters and runtime adapter info into different struct-s.
Bump max graph size according to LoRa count and tensor size.
@ngxson
Copy link
Collaborator

ngxson commented Dec 29, 2025

I don't think this fix is valid as-is. There is no need to add a new dedicated API for it.

There must be an easier way, I'll see.

Another small thing: adapter scale -1.0f is a valid scale i.e. inverted adapter

By design, negative is valid. Looking at the code, I suspect what you said about negative value is only apply for A-LoRA, not normal LoRA

@ngxson
Copy link
Collaborator

ngxson commented Dec 29, 2025

Superseded by #18469

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants