Skip to content

[Bug] Flux.2-Klein VAE Decode Artefacts #1275

@cholleme

Description

@cholleme

Git commit

git: 636d3cb

Operating System & Version

Ubuntu 24.04.3 LTS

GGML backends

CUDA

Command-line arguments used

[ "--diffusion-model", "/home/charles/coding/models/flux2/flux-2-klein-4b.safetensors", "--vae", "/home/charles/coding/models/flux2/flux2-vae.safetensors", "--llm", "/home/charles/coding/models/flux2/Qwen3-4B-Q6_K.gguf", "--cfg-scale", "1.0", "--steps", "4", "-v", "--diffusion-fa", "--width", "1024", "--height", "1024", "-p", ""A cat holding a beachball on the river bank."" ]

Steps to reproduce

Get the latents exported by diffusers from latents_3.zip.

Add the following code to load the repro data:

    LOG_INFO("decoding %zu latents", final_latents.size());

    FILE* f=fopen("/home/charles/coding/wishingwell/scripts/latents_3.raw", "rb");
    fseek(f, 0, SEEK_END);
    long fsize = ftell(f);
    fseek(f, 0, SEEK_SET);

    float *data = (float*)malloc(fsize + 1);
    fread(data, fsize, 1, f);
    fclose(f);

    GGML_ASSERT(fsize == ggml_nbytes(final_latents[0]));
    LOG_INFO("latent: %zu %zu %zu %zu", final_latents[0]->ne[0], final_latents[0]->ne[1], final_latents[0]->ne[2], final_latents[0]->ne[3]); 
    auto hack_latent = ggml_dup_tensor(work_ctx, final_latents[0]); 
    memcpy(hack_latent->data, data, fsize);
    final_latents[0] = hack_latent;

After the line

    LOG_INFO("generating %" PRId64 " latent images completed, taking %.2fs", final_latents.size(), (t3 - t1) * 1.0f / 1000);

In stable-diffusion.cpp

(also attached the python code to generate this file)
simple_inference.py
latents_3.zip

What you expected to happen

When working with flux2.klein I noticed artifacts I didn't see when using the python diffuses version before. I suspected the VAE so made a quick repro where diffusers outputs the raw VAE and then load it in stable-diffusion.cpp for decoding. The results are worse in particular notice the gradients on the ball.

This seems beyond expected difference of the two implementations and leads to unusable quality.

Image Image

What actually happened

Noticeable quality degradation see Images. Same style artifacts happen when directly generating images from prompts. They seem mostly visible in smooth gradient areas.

This one is generated using stock stable-diffusion.cpp

Image

Logs / error messages / stack trace

None

Additional context / environment details

NVIDIA GB10 Driver Version: 580.95.05 CUDA Version: 13.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions