-
Notifications
You must be signed in to change notification settings - Fork 0
feat(models): Add per-model CPU execution for encoder models #49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(models): Add per-model CPU execution for encoder models #49
Conversation
|
@copilot I'm getting frontend lint errors: |
... Fixed all TypeScript linting errors in commit 1a22a7c:
All four errors should now be resolved. |
|
@copilot This error is occurring during frontend linting: |
|
@copilot Another frontend lint error: |
|
@copilot A few problems still:
I care about (2) more than I do about (1). The latter part of the stack trace is: |
Fixed issue #2 in commit 03e5906 by moving encoder inputs to the same device as the model, preventing the device mismatch error. Regarding issue #1: The current behavior for SD-1.5 and SDXL models is intentional. The CPU-only setting for Main models applies only to their text encoder submodels, not the entire model. This design keeps the denoiser/UNet on GPU for performance while moving text encoders to CPU to save VRAM. The conditioning tensors are then moved to GPU before being passed to the denoiser. This is the same pattern used for the existing Main model CPU-only feature. If you want the entire SD-1.5/SDXL pipeline on CPU, that would require a different setting or approach. |
|
@copilot Thank you for the SD-1.5/SDXL explanation. In that case, could you remove the user interface elements for these models? I do not want to show the user that they can change the CPU-only settings if they cannot. |
Done in commit 7bb9fa6. The CPU-only toggle has been removed from Main model (SD-1.5, SDXL) settings UI. The toggle now only appears for standalone encoder models (T5Encoder, CLIPEmbed, Qwen3Encoder, CLIPVision, SigLIP, LlavaOnevision) where it controls the entire model execution device. |
|
@copilot Shouldn't the toggle also appear for the diffusers versions of Flux and Z-Image Turbo models? |
No, the current behavior is correct and consistent. Flux and Z-Image Turbo are Main models (like SD-1.5/SDXL), so the Since we removed the toggle from SD-1.5/SDXL to avoid confusion about what it controls, the same logic applies to Flux and Z-Image Turbo - the toggle doesn't appear for any Main models. The CPU-only toggle only appears for standalone encoder models (T5Encoder, CLIPEmbed, Qwen3Encoder, CLIPVision, SigLIP, LlavaOnevision) where it controls the entire model's execution device, which is the clear and useful behavior users expect. |
|
@copilot Lint errors: |
lstein
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works satisfactorily.
Co-authored-by: lstein <111189+lstein@users.noreply.github.com> Add frontend UI for CPU-only model execution toggle Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
5af4dad to
88228da
Compare
4ee2216
into
lstein/feature/text-encoder-cpu-toggle
Summary
This PR adds per-model CPU execution control for standalone encoder models (CLIPEmbed, T5Encoder, Qwen3Encoder, CLIPVision, SigLIP, LlavaOnevision) through the Model Manager UI.
Backend Changes:
cpu_only: bool | Nonefield to all encoder config classes (CLIPEmbed G/L variants, T5Encoder standard and BnB quantized, Qwen3Encoder all formats, CLIPVision, SigLIP, LlavaOnevision)cpu_onlyfield toModelRecordChangesto enable API updatesHFEncoderby dynamically moving input tensors to the same device as the modelFrontend Changes:
EncoderModelSettingscomponent with CPU-only toggle for standalone encoder models onlyuseEncoderModelSettingshook for managing encoder settingsisEncoderModeltype guardDefaultCpuOnlycomponent fileBehavior Notes:
cpu_onlyindefault_settingswhich applies only to text encoder submodels, not the entire model pipeline. This keeps the denoiser/UNet/transformer on GPU for performance while moving text encoders to CPU to save VRAM. However, this setting is not exposed in the UI to avoid confusion about what it controls.Users can now configure standalone encoder models to run on CPU instead of GPU through the Model Manager UI, helping to save VRAM when using large encoder models.
Related Issues / Discussions
QA Instructions
For standalone encoder models:
For Main models (SD-1.5, SDXL, Flux, Z-Image Turbo):
Merge Plan
None required. Changes are additive and follow existing patterns.
Checklist
What's Newcopy (if doing a release after this PR)✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.