-
Notifications
You must be signed in to change notification settings - Fork 0
[Feature] CPU execution for text encoders with automatic device management #47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
... Fixed the runtime error by adding the missing popover translation for the Added translation with heading "CPU Only" and two paragraph descriptions explaining the feature. Commit: 02766a0 |
Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
|
@copilot Great! A couple of things:
|
Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
Completed both requests:
This ensures inputs are created on the same device as the model, preventing the RuntimeError you encountered. Commit: 918d21f |
|
@copilot Thank you for your work on this. There are still tensor device mismatch errors, but I've also rethought the scope of the work. I think that the CPU-only option should be limited to standalone text encoders and to the text encoder sub-model of Main models. The core issue is that the conditioning tensors returned by a text encoder running on CPU needs to be moved onto the GPU in order to be used as the input to the next phase of the pipeline, usually the denoiser. So this is a special case that needs to be specifically handled. So could you change the code such that when a Main model is marked CPU-only, only the text encoder component is executed on CPU and its conditioning outputs are moved to GPU before passing to the denoiser? Similarly, the standalone encoders should have a CPU-only option in the backend and frontend, and their outputs should be handled similarly. Let me know if you see any problems with this approach. |
…moved to CPU for storage Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
Implemented the scope change to limit CPU-only execution to text encoders only: Backend changes:
Frontend changes: How it works:
This approach balances VRAM savings with performance. Commit: e850b78 |
|
@copilot I'm not sure this is working as expected. I selected the diffusers Z-Image Turbo Main model in the Model Manager tab, and selected "Run model on CPU only." I then started a generation using this model. But prior to denoising, the log shows: I don't expect to see the message about the Qwen3Model being loaded into CUDA. |
Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
Fixed the issue. The problem was in the Changes made:
Text encoders with cpu_only=True will now correctly load on CPU and log messages will show "onto cpu device" instead of "onto cuda device". This applies to both main model text encoders (SD1.5, SDXL, Z-Image, etc.) and standalone encoders. Commit: 18532b8 |
lstein
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working as expected. Will correct linting errors after merge with lstein/per-model-cpu-execution branch.
7ca2716
into
lstein/per-model-cpu-execution
Summary
This PR implements per-model CPU execution capabilities limited to text encoder submodels only, providing VRAM savings while maintaining optimal performance.
Scope: When CPU-only is enabled for a main model, only the text encoder submodels (TextEncoder, TextEncoder2, TextEncoder3) run on CPU. The denoiser and other submodels always run on GPU for optimal performance. Conditioning tensors are automatically moved to CPU for storage and to GPU when needed by the denoiser.
Frontend Changes:
cpu_onlyfield inMainModelDefaultSettings'cpuOnly'toFeaturetype unionBackend Changes:
_get_execution_device()to checksubmodel_typeparameter - CPU execution only applied to text encoder submodelscpu_onlyfield to standalone text encoder configurations (T5Encoder, Qwen3Encoder variants) to support future frontend UIcpu_onlyfor both main model text encoder submodels and standalone encodersCritical Bug Fixes:
lock()method checked cache's default_execution_deviceinstead of model-specificcompute_device. Fixed by addingcompute_deviceproperty to cached model classes and updatinglock()to check model-specific device setting.text_encoder.deviceinstead ofTorchDevice.choose_torch_device():compel.py(SD1.5 and SDXL variants)sd3_text_encoder.py(CLIP and T5 encoders)flux_text_encoder.py(CLIP and T5 encoders)cogview4_text_encoder.py(GLM encoder)z_image_text_encoder.py(Qwen3 encoder)Related Issues / Discussions
Part of #44 - Per-model CPU execution feature
QA Instructions
pnpm lint:tscininvokeai/frontend/web- should pass without errorsMerge Plan
N/A
Checklist
What's Newcopy (if doing a release after this PR)✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.