support for customizing LoRA multipliers through the sdapi by wbruna · Pull Request #1982 · LostRuins/koboldcpp

wbruna · 2026-02-19T10:54:51Z

~~This is still just an idea!~~

Since we just got support for multiple LoRAs, we could include LoRA customization on the API side, by:

internally allowing the weights to be changed at generation time
showing the preloaded LoRAs under /sdapi/v1/loras
accepting just changing the weights of the preloaded LoRAs through the lora fileld at /sdapi/v1/txt2img and /sdapi/v1/img2img

I recently implemented support on my Python client script for the mainline sd-server implementation, so I have a reasonable idea about how complicated that would be. I'm also aware that the sd.cpp C API would have to be adapted to allow changing LoRA weights without reloading the models.

Do you think this would be worth implementing?

LostRuins · 2026-02-19T11:36:07Z

Does it have any implications on memory use or runtime file loading?

wbruna · 2026-02-19T13:50:40Z

For at_runtime LoRA mode, I believe it wouldn't change at all.

For immediately LoRA mode, it could mean higher memory usage: currently, the code could be unloading the weights right after applying, since they wouldn't be needed anymore (need to check the code to be sure). And to change the weights, we need them back in memory, either reloading from disk or keeping them around in RAM. Generation latency would also increase a bit, because we'd need to reapply the LoRAs (but only when the weight is changed).

henk717 · 2026-02-26T19:19:59Z

Personally I have seen this request a few times. There is demand for it. If its a bit slower during a switch that is better than not having it at all. Just make sure nothing changes if its not used.

wbruna · 2026-02-27T02:14:58Z

Got a first somewhat-working version.

I've included code for the <lora:name:weight> syntax on the prompt, to make testing easier. The API code is implemented, but I didn't test it yet.

As suspected, immediately LoRA mode discards the weights as soon as they are applied (lora->free_params_buffer() in apply_loras_immediately). So we need to either remove that call (as I've done for now), or restrict changing LoRA weights to the at_runtime mode. One way to do that without an extra command-line flag could be:

allow --sdloramult to receive a list of multipliers
LoRAs with multiplier != 0 would have fixed weights, as they are now
LoRAs with multiplier 0 would be allowed runtime multiplier changes (through the sdapi and/or prompt). We could also add a parameter to gendefaults to still be able to set a default non-zero multiplier for them
the presence of any customizable LoRA would force at_runtime mode. This way, we could keep the free_params_buffer call as-is, so setups with no customizable LoRAs would keep working as they are now, with no extra memory usage. It may even be possible to force at_runtime only for the customizable LoRAs.

What do you think?

wbruna · 2026-02-27T02:17:00Z

By the way, it's also possible to support the <lora:name:weight> syntax as an UI functionality, parsing and converting it to the sdapi parameter on the stable-ui. I'm not exactly looking forward to implement it that way, but it would make sense from a compatibility POV, since it'd allow LoRA loading from other sdapi servers too.

wbruna · 2026-03-01T03:22:41Z

Should be ready enough for reviewing.

As described before:

sdloramult now receives a list of multipliers, one per LoRA
by default, the first LoRA have multiplier 1.0, and extra LoRAs 0.0 (no strong opinions about this, it was simply the easiest behavior to implement)
if all multipliers are non-zero, the LoRAs are loaded as before, with no changes to VRAM usage or inference time
if any LoRA is specified with multiplier 0, all LoRAs will be loaded in at_runtime mode
the LoRAs with multiplier 0 are advertised on the sdapi/v1/loras endpoint, and their multipliers can be changed both by the lora sdapi request field and the <lora:name:value> prompt syntax.

wbruna · 2026-03-01T12:42:56Z

Cleaned up the code, and reorganized the commits. Tested with Klein 9b and SDXL. Probably needs some polishing on the launcher and config side, once we decide the zero-multiplier approach is OK.

I'll leave this aside a bit, to focus on master-509-4cdfff5 🙂

Riztard · 2026-03-01T17:11:23Z

is this the intended behavior that both lora weight is 0 in here even tho the value is 1.0 in the launcher?

i thought you said the first lora is 1 by default

it working tho if i add and change the lora weigh with these
<lora:Yoo Ah-yeongilluLora:1><lora:S1 Dramatic Lighting Illustrious_V2:0.5>

i thought it will be like re load the lora & model if the value is changed, but it seems like real dynamic lora(well only for the weight)

Riztard · 2026-03-02T09:16:18Z

not showing absolute path?

LostRuins · 2026-03-02T13:27:33Z

the default behavior right now (before this PR), is when one multiplier is provided (which is the current status quo of the launcher), all loras are initialized at the same strength, which is what should be default i think. E.g. --loramult 0.6 --lora pixel_lora.gguf color_lora.gguf currently loads both loras at 0.6.

Then the API override should augment it to a new value temporarily for that request (only adjustable for those loras loaded at mult 0).

Also I think inputs.lora_apply_mode = 0 #auto for now currently in koboldcpp.py allows the loras to work automatically? I do not recall having to adjust the lora apply mode beyond this

wbruna · 2026-03-02T15:13:00Z

not showing absolute path?

Intentionally omitted, since it could be considered sensitive information. Usually, we'd have a root directory for all the LoRA files, then we could show subpaths under it. But all LoRAs now are specified by full path, so we can't know which part could be shown.

(@LostRuins , a lora-model-dir would make this easy to do, and avoid lots of long paths both on the command line and on the UI, at the cost of a new UI field. I could implement it, if it's OK for you)

wbruna · 2026-03-02T15:13:23Z

the default behavior right now (before this PR), is when one multiplier is provided (which is the current status quo of the launcher), all loras are initialized at the same strength, which is what should be default i think. E.g. --loramult 0.6 --lora pixel_lora.gguf color_lora.gguf currently loads both loras at 0.6.

Then the API override should augment it to a new value temporarily for that request (only adjustable for those loras loaded at mult 0).

Alright, I'll adjust it later (and fix the bug @Riztard mentioned).

Also I think inputs.lora_apply_mode = 0 #auto for now currently in koboldcpp.py allows the loras to work automatically? I do not recall having to adjust the lora apply mode beyond this

auto defaults to immediately, and switches to at_runtime only for quantized models; which is fine. The problem is: in immediately mode, the LoRA weights are discarded from memory as soon as they are applied; so, to change the multipliers for new generations, we'd need to reload the weights from disk. I could change that behavior, but it's tricky because just keeping the weights around would penalize the non-changeable-multiplier case. immediately can also be less accurate for multiplier changes, since precision errors would be cumulative.

at_runtime already keeps the LoRA objects around, so keeping a reference for them on the cache is enough to avoid the I/O. In principle, we could have a mix of fixed immediately and changeable at_runtime LoRAs; but sd.cpp currently doesn't track that property per-LoRA, so we'd need a more extensive and delicate code change.

wbruna · 2026-03-04T02:41:34Z

Rebased on top of #2006 to get a fix for zero-multiplier LoRAs getting stuck, and to be able to test both PRs at the same time; but I'll keep the branches separate.

Also restored the behavior when a single multiplier is specified. Now:

no multipliers: all LoRAs have multiplier 1
single multiplier: all LoRAs have that same multiplier
more than one multiplier: extend multiplier list with zeroes

LostRuins · 2026-03-07T17:26:32Z

hmm i merged your other big PR, i dunno why the files are still shown here as modified

wbruna · 2026-03-07T23:22:22Z

Weird. Well, rebasing cleaned it up.

koboldcpp.py

LostRuins · 2026-03-08T09:08:12Z

besides above,

The logic for zero multipliers seems slightly confusing. So I think the goal is for only loras with multiplier = 0.0 to be adjustable over the API. I.e. if you have on load loraA=0.8, loraB=0.0, loraC=1.0 then only loraB is adjustable over the api.

But in the prepare_lora_multipliers you have dynamic = 0. in orig_multipliers, which doesn't seem to be correct

Riztard · 2026-03-08T10:39:03Z

The logic for zero multipliers seems slightly confusing. So I think the goal is for only loras with multiplier = 0.0 to be adjustable over the API. I.e. if you have on load loraA=0.8, loraB=0.0, loraC=1.0 then only loraB is adjustable over the api.

why only some lora need to be adjustable. is less = more performance/hassle/something?
also user might not know without info tooltip/something, so need to add it if implemented like that

wbruna · 2026-03-08T11:29:18Z

The logic for zero multipliers seems slightly confusing. So I think the goal is for only loras with multiplier = 0.0 to be adjustable over the API. I.e. if you have on load loraA=0.8, loraB=0.0, loraC=1.0 then only loraB is adjustable over the api.

That is what's happening.

But in the prepare_lora_multipliers you have dynamic = 0. in orig_multipliers, which doesn't seem to be correct

That boolean is not per-LoRA: the only thing it does is to short-circuit the LoRA list processing when no dynamic LoRAs were requested. I'll drop that part to simplify the code; the if multiplier == 0. below is enough.

wbruna · 2026-03-08T11:37:22Z

why only some lora need to be adjustable. is less = more performance/hassle/something?

Yes; supporting any dynamic LoRAs means more memory usage and extra processing time for all LoRAs. It may be possible to optimize that, but the code would be too fragile without upstream changes.

also user might not know without info tooltip/something, so need to add it if implemented like that

I do not disagree, but we can't document a feature before agreeing on what it should do 🙂

LostRuins · 2026-03-08T12:35:59Z

@Riztard it does make perfect sense I think. if the user has specified a multipler (e.g. 0.75) in the launcher, then that multiplier should be obeyed for that lora. If the user chose not to specify a multiplier (e.g. 0.0) in the launcher, only then the backend is going to use what the API requests (defaulting to disabled since Wx0=0). But since multiple multipliers can be applied separately this logic extends to each one individually, so LoRA A might be adjustable (because it was set to 0) while LoRA B is fixed at 0.65 as per launcher args.

Do you think that's not intuitive?

Riztard · 2026-03-08T14:38:43Z

@Riztard it does make perfect sense I think. if the user has specified a multipler (e.g. 0.75) in the launcher, then that multiplier should be obeyed for that lora. If the user chose not to specify a multiplier (e.g. 0.0) in the launcher, only then the backend is going to use what the API requests (defaulting to disabled since Wx0=0). But since multiple multipliers can be applied separately this logic extends to each one individually, so LoRA A might be adjustable (because it was set to 0) while LoRA B is fixed at 0.65 as per launcher args.

Do you think that's not intuitive?

Kinda, if the user know about that.
Also common way dynamic Lora used is all can be adjusted, so user might think that way.

wbruna · 2026-03-08T14:54:00Z

Folded the fixes into each commit, and added the LoRA tags to the generated image metadata.

Also fix typo in the function name.

The `sdloramult` flag now accepts a list of multipliers, one for each LoRA. If all multipliers are non-zero, LoRAs load as before, with no extra VRAM usage or performance impact. If any LoRA has a multiplier of 0, we switch to `at_runtime` mode, and these LoRAs will be available to multiplier changes via the `lora` sdapi field and show up in the `sdapi/v1/loras` endpoint. All LoRAs are still preloaded on startup, and cached to avoid file reloads. If the list of multipliers is shorter than the list of LoRAs, the multiplier list is extended with the first multiplier (1.0 by default), to keep it compatible with the previous behavior.

wbruna force-pushed the kcpp_sdapi_loras branch from b0735b5 to 1ddd1a8 Compare February 27, 2026 00:37

wbruna force-pushed the kcpp_sdapi_loras branch from 8d4bc54 to f013f51 Compare February 28, 2026 16:11

wbruna changed the title ~~[WIP] support for customizing LoRA weights through the sdapi~~ support for customizing LoRA weights through the sdapi Mar 1, 2026

wbruna marked this pull request as ready for review March 1, 2026 02:43

wbruna force-pushed the kcpp_sdapi_loras branch from 730030d to b37b9dd Compare March 1, 2026 12:30

wbruna changed the title ~~support for customizing LoRA weights through the sdapi~~ support for customizing LoRA multipliers through the sdapi Mar 1, 2026

LostRuins added the enhancement New feature or request label Mar 2, 2026

wbruna force-pushed the kcpp_sdapi_loras branch from b37b9dd to 2978a85 Compare March 4, 2026 02:30

wbruna mentioned this pull request Mar 4, 2026

sd: sync to master-520-d950627 #2006

Merged

LostRuins force-pushed the concedo_experimental branch from ca2cced to 54cf43a Compare March 4, 2026 03:00

wbruna force-pushed the kcpp_sdapi_loras branch 3 times, most recently from e59abca to 8115263 Compare March 6, 2026 23:17

wbruna mentioned this pull request Mar 7, 2026

add support for cache modes to accelerate image generation #2021

Draft

wbruna force-pushed the kcpp_sdapi_loras branch from 8115263 to 7a4559a Compare March 7, 2026 23:20

LostRuins reviewed Mar 8, 2026

View reviewed changes

koboldcpp.py Outdated Show resolved Hide resolved

LostRuins reviewed Mar 8, 2026

View reviewed changes

koboldcpp.py Show resolved Hide resolved

LostRuins reviewed Mar 8, 2026

View reviewed changes

koboldcpp.py Outdated Show resolved Hide resolved

wbruna marked this pull request as draft March 8, 2026 11:38

wbruna force-pushed the kcpp_sdapi_loras branch from 7f5b0d6 to 12c80b4 Compare March 8, 2026 14:49

wbruna marked this pull request as ready for review March 8, 2026 14:54

wbruna added 2 commits March 9, 2026 08:41

fix corner case in sd_oai_transform_params

05f632d

Also fix typo in the function name.

wbruna force-pushed the kcpp_sdapi_loras branch from 12c80b4 to c12ae22 Compare March 9, 2026 11:44

wbruna added 2 commits March 9, 2026 14:10

support for <lora:name:multiplier> prompt syntax and metadata

f9cf66b

add a few tests for sanitize_lora_multipliers

eed83da

wbruna force-pushed the kcpp_sdapi_loras branch from c12ae22 to eed83da Compare March 9, 2026 17:21

Conversation

wbruna commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented Feb 19, 2026

Uh oh!

wbruna commented Feb 19, 2026

Uh oh!

henk717 commented Feb 26, 2026

Uh oh!

wbruna commented Feb 27, 2026

Uh oh!

wbruna commented Feb 27, 2026

Uh oh!

wbruna commented Mar 1, 2026

Uh oh!

wbruna commented Mar 1, 2026

Uh oh!

Riztard commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Riztard commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented Mar 2, 2026

Uh oh!

wbruna commented Mar 2, 2026

Uh oh!

wbruna commented Mar 2, 2026

Uh oh!

wbruna commented Mar 4, 2026

Uh oh!

LostRuins commented Mar 7, 2026

Uh oh!

wbruna commented Mar 7, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

LostRuins commented Mar 8, 2026

Uh oh!

Riztard commented Mar 8, 2026

Uh oh!

wbruna commented Mar 8, 2026

Uh oh!

wbruna commented Mar 8, 2026

Uh oh!

LostRuins commented Mar 8, 2026

Uh oh!

Riztard commented Mar 8, 2026

Uh oh!

wbruna commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wbruna commented Feb 19, 2026 •

edited

Loading

Riztard commented Mar 1, 2026 •

edited

Loading

Riztard commented Mar 2, 2026 •

edited

Loading