Skip to content

UPSTREAM PR #1276: fix: correct sdapi LoRA file handling#55

Open
loci-dev wants to merge 1 commit intomainfrom
loci/pr-1276-sdapi_fix_lora_paths
Open

UPSTREAM PR #1276: fix: correct sdapi LoRA file handling#55
loci-dev wants to merge 1 commit intomainfrom
loci/pr-1276-sdapi_fix_lora_paths

Conversation

@loci-dev
Copy link

Note

Source pull request: leejet/stable-diffusion.cpp#1276

  • open LoRA files through their discovered absolute paths
  • do not resolve symlinks when building the LoRA list

- open LoRA files through their discovered absolute paths
- do not resolve symlinks when building the LoRA list
@loci-dev loci-dev temporarily deployed to stable-diffusion-cpp-prod February 14, 2026 04:58 — with GitHub Actions Inactive
@loci-review
Copy link

loci-review bot commented Feb 14, 2026

Overview

Analysis of 48,305 functions across two binaries revealed 63 modified functions (0.13%) from a single commit fixing LoRA file handling. Power consumption increased minimally: build.bin.sd-server (+0.601%, +3,097 nJ) and build.bin.sd-cli (+0.703%, +3,374 nJ).

Function counts: 63 modified, 2 new, 3 removed, 48,237 unchanged.

Function Analysis

The commit expanded LoraEntry from 2 to 3 std::string members (adding fullpath), causing expected regressions in object lifecycle operations:

Memory Operations:

  • std::vector<LoraEntry>::_S_max_size: +227ns throughput (+184%), +227ns response (+142%)
  • std::__new_allocator<LoraEntry>::allocate: +70ns throughput (+86%), +70ns response (+67%)
  • std::__new_allocator<LoraEntry>::deallocate: +16ns throughput (+86%), +17ns response (+62%)

Lifecycle Operations:

  • LoraEntry::operator=: +6ns throughput (+27%), +2,738ns response (+50%)
  • LoraEntry::LoraEntry (move): +9ns throughput (+40%), +581ns response (+50%)
  • LoraEntry::~LoraEntry: +6ns throughput (+28%), +280ns response (+49%)
  • std::__copy_move_backward<LoraEntry>: +12ns throughput (+16%), +2,749ns response (+49%)

Application Logic:

  • get_lora_full_path lambda: +71ns throughput (+81%), +1,551ns response (+45%)

Iterator Operations:

  • __normal_iterator<LoraEntry*>::operator+: +44ns throughput (+53%), +44ns response (+42%)
  • __normal_iterator<shared_ptr<LoraModel>*>::operator+: -69ns throughput (-48%), -69ns response (-42%) — improvement from algorithmic optimization

Standard Library Anomalies:

  • std::_Hashtable::begin: +187ns throughput (+310%), +187ns response (+181%) — likely compiler optimization differences
  • std::_Sp_counted_ptr_inplace<FluxFlowDenoiser>::_M_destroy: +189ns throughput (+180%), +189ns response (+61%) — unrelated to code changes

Other analyzed functions showed consistent patterns aligned with the structural change.

Additional Findings

Justification: The regressions are expected consequences of storing an additional string field. The refactoring eliminates filesystem syscalls (microseconds) by caching absolute paths, replaces double-pass lookups (std::any_of + fetch) with single-pass (std::find_if), and fixes path resolution bugs. All affected functions execute during initialization/cache updates, not in the inference pipeline. The sub-1% power increase and sub-microsecond absolute impacts are negligible compared to image generation workloads (seconds). Zero impact on GPU operations, tensor computations, or ML inference performance.

🔎 Full breakdown: Loci Inspector.
💬 Questions? Tag @loci-dev.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments