RPC Internal Error when loading Gemma 3 (llama-cpp backends)

**LocalAI version:**
Tested on:
* `localai/localai:latest-gpu-nvidia-cuda-13`
* `localai/localai:master-gpu-nvidia-cuda-13`
* Backends tested: llama-cpp (stable), cuda13-llama-cpp (stable), llama-cpp-development, and cuda13-llama-cpp-development

**Environment, CPU architecture, OS, and Version:**
Linux garayco 6.6.87.2-microsoft-standard-WSL2 #1 SMP PREEMPT_DYNAMIC Thu Jun  5 18:30:46 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Environment: Docker running inside WSL2 (Ubuntu) on Windows.
Hardware: NVIDIA RTX 3060 (12GB VRAM).

**Describe the bug**
LocalAI fails to load the **Gemma 3** models (specifically `gemma3:12b-it-q4_K_M`) when using the `llama-cpp` backend. The model loading aborts with an RPC Internal Error stating that the key `gemma3.attention.layer_norm_rms_epsilon` is not found in the model hyperparameters.

**To Reproduce**
1. Start the LocalAI container with GPU support:
   `docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13`
2. Download the model via Ollama 
3. Run the model in the chat
5. The backend process crashes/timeouts during the model load phase.
6. 
**Logs**
```text
BackendLoader starting modelID="gemma3:12b-it-q4_K_M" backend="llama-cpp-development" model="gemma3__12b-it-q4_K_M"
ERROR Failed to load model modelID="gemma3:12b-it-q4_K_M" error=failed to load model with internal loader: could not load model: rpc error: code = Internal desc = Failed to load model: /models/gemma3__12b-it-q4_K_M. Error: llama_model_load: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon; llama_model_load_from_file_impl: failed to load model; llama_params_fit: encountered an error while trying to fit params to free device memory: failed to load model; llama_model_load: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon; llama_model_load_from_file_impl: failed to load model backend="llama-cpp-development"
ERROR Stream ended with error error=failed to load model with internal loader: could not load model: rpc error: code = Internal desc = Failed to load model: /models/gemma3__12b-it-q4_K_M. Error: llama_model_load: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon; llama_model_load_from_file_impl: failed to load model; llama_params_fit: encountered an error while trying to fit params to free device memory: failed to load model; llama_model_load: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon; llama_model_load_from_file_impl: failed to load model
INFO  HTTP request method="POST" path="/v1/chat/completions" status=200
```

**Additional context**
I initially tested this using the standard stable backends (llama-cpp and cuda13-llama-cpp). Thinking it was a versioning issue, I then completely removed the stable versions via the WebUI and exclusively tested the -development versions to force the latest build. The exact same RPC error persists across all of them. It seems the -development builds need to be re-triggered upstream to pull the latest llama.cpp master branch.

<img width="1395" height="288" alt="Image" src="https://github.com/user-attachments/assets/9e3d0ca1-26da-4eb3-b483-1d35aa1d6979" />

<img width="1222" height="365" alt="Image" src="https://github.com/user-attachments/assets/72eed4db-d865-4e32-b2d9-55f0565838e6" />

<img width="895" height="620" alt="Image" src="https://github.com/user-attachments/assets/4d25a3fd-ff99-40e6-a1b6-0bd4beeb1221" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RPC Internal Error when loading Gemma 3 (llama-cpp backends) #9414

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

RPC Internal Error when loading Gemma 3 (llama-cpp backends) #9414

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions