UPSTREAM PR #1255: feat: add -H --height and -W --width options #1248#53
UPSTREAM PR #1255: feat: add -H --height and -W --width options #1248#53
Conversation
OverviewAnalysis of stable-diffusion.cpp compared 48,089 functions across two binaries following a single commit adding CLI options for image dimensions. Modified functions: 60 (0.12%), new: 2, removed: 1, unchanged: 48,026 (99.87%). Power Consumption:
Function AnalysisSDSvrParams::get_options (directly modified): Throughput +82ns (+9.29%), response +8,959ns (+11.58%). Added two CLI options for default image height/width. The 9μs overhead occurs once at startup, not affecting inference performance. Change is justified by the feature addition. apply_binary_op<op_div, ggml_bf16_t> (GGML tensor operation): Throughput +79ns (+6.64%), response +93ns (+3.59%). Division operations on bfloat16 tensors used in normalization and attention scaling. Potentially called thousands of times per inference, cumulative impact ~593μs per image. Source in GGML submodule (not accessible); regression warrants investigation. apply_unary_op<op_hardsigmoid, ggml_bf16_t>: Throughput -71ns (-9.11%), response -71ns (-3.47%). Improvement in hard sigmoid activation partially offsets division regression. Standard library functions (std::less, std::vector, std::unordered_map operations): Mixed results with throughput changes ranging from -74ns to +45ns. Most are compiler/toolchain artifacts affecting initialization code, not inference paths. Vector copy constructor improved (-33.91%), comparison operator regressed (+68.69%), but net impact is minimal as these operate during model loading. Other analyzed functions showed negligible changes in non-critical paths. Additional FindingsThe commit modified only CLI parsing code, yet most performance variations stem from compiler/standard library differences between builds. ML inference impact is sub-millisecond (<1ms per image, <0.1% of total generation time). The division operation regression in GGML's bfloat16 handling is the only noteworthy concern for ML workloads, though absolute impact remains small. Overall system maintains excellent performance characteristics with appropriate trade-offs for added functionality. 🔎 Full breakdown: Loci Inspector. |
3ad80c4 to
74d69ae
Compare
Note
Source pull request: leejet/stable-diffusion.cpp#1255
After falling on my face with the first PR, it seemed necessary to get up and try again with a different issue.
Setting the default -H (--height) and -W (--width) options from the
sd-servercommand line.There is nothing special to this. Manually added
.default_widthand.default_heighttostruct SDSvrParamsand initialized both endpoints with that, instead512.