[Feature] CPU execution for text encoders with automatic device management #47

Copilot · 2026-01-21T05:22:39Z

Summary

This PR implements per-model CPU execution capabilities limited to text encoder submodels only, providing VRAM savings while maintaining optimal performance.

Scope: When CPU-only is enabled for a main model, only the text encoder submodels (TextEncoder, TextEncoder2, TextEncoder3) run on CPU. The denoiser and other submodels always run on GPU for optimal performance. Conditioning tensors are automatically moved to CPU for storage and to GPU when needed by the denoiser.

Frontend Changes:

Simplified CPU Only toggle to a single switch control (removed dual toggle UI)
Fixed TypeScript compilation errors for cpu_only field in MainModelDefaultSettings
Added 'cpuOnly' to Feature type union
Updated popover translation to clarify that only text encoders run on CPU, not the entire model

Backend Changes:

Modified _get_execution_device() to check submodel_type parameter - CPU execution only applied to text encoder submodels
Added cpu_only field to standalone text encoder configurations (T5Encoder, Qwen3Encoder variants) to support future frontend UI
Updated model loader to respect cpu_only for both main model text encoder submodels and standalone encoders
All text encoder invocations now explicitly move conditioning tensors to CPU for storage (saves VRAM)

Critical Bug Fixes:

Fixed model cache device placement bug: Text encoders were loading onto CUDA even when CPU-only was enabled. Root cause: lock() method checked cache's default _execution_device instead of model-specific compute_device. Fixed by adding compute_device property to cached model classes and updating lock() to check model-specific device setting.
Fixed device mismatch error where text encoders running on CPU received inputs on CUDA by updating all text encoder invocations to use text_encoder.device instead of TorchDevice.choose_torch_device():
- compel.py (SD1.5 and SDXL variants)
- sd3_text_encoder.py (CLIP and T5 encoders)
- flux_text_encoder.py (CLIP and T5 encoders)
- cogview4_text_encoder.py (GLM encoder)
- z_image_text_encoder.py (Qwen3 encoder)
Updated log messages to display actual model device instead of cache default
Ensures input tensors are created on the same device as the model, preventing RuntimeError
Conditioning tensors moved to CPU for storage, automatically transferred to GPU when used by denoiser

Related Issues / Discussions

Part of #44 - Per-model CPU execution feature

QA Instructions

Frontend Linting: Run pnpm lint:tsc in invokeai/frontend/web - should pass without errors
Main Models Tab: Navigate to Model Manager → Main Models tab - should load without errors
CPU Only UI: Select a model → Default Settings tab → verify single "Run model on CPU only" toggle displays with updated help popover explaining text encoder scope
CPU Execution: Enable CPU-only for a model and generate an image - should work without device mismatch errors
VRAM Savings: Monitor VRAM usage - text encoders should run on CPU while denoiser runs on GPU
Log Verification: Check logs during generation - text encoder models should show "onto cpu device" not "onto cuda device" when CPU-only is enabled
Performance: Verify generation performance is maintained (only text encoding slightly slower, denoising at full GPU speed)
Model Type Coverage: Test with SD1.5, SDXL, Z-Image, FLUX, and SD3 models to verify CPU-only works across all model types
Standalone Encoders: Verify T5 and Qwen3 standalone encoders respect cpu_only configuration in their model configs (backend support ready for future frontend UI)

Merge Plan

N/A

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
❗Changes to a redux slice have a corresponding migration
Documentation added / updated (if applicable)
Updated What's New copy (if doing a release after this PR)

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: lstein <111189+lstein@users.noreply.github.com>

lstein · 2026-01-21T05:36:20Z

@copilot When I try to go to the main models tab, I get this error:

Co-authored-by: lstein <111189+lstein@users.noreply.github.com>

Copilot · 2026-01-21T05:40:47Z

@copilot When I try to go to the main models tab, I get this error:

...

Fixed the runtime error by adding the missing popover translation for the cpuOnly feature. The InformationalPopover component was trying to call .map() on undefined paragraphs because the translation entry was missing.

Added translation with heading "CPU Only" and two paragraph descriptions explaining the feature. Commit: 02766a0

Co-authored-by: lstein <111189+lstein@users.noreply.github.com>

lstein · 2026-01-21T05:50:48Z

@copilot Great! A couple of things:

Could you change the UI so that there is only a single slider button needed to turn on Run model on CPU only?
Add the CPU only option to the text encoders "T5 Encoder" and "Qwen3 Encoder" and make the appropriate backend changes.

Co-authored-by: lstein <111189+lstein@users.noreply.github.com>

Copilot · 2026-01-21T05:57:39Z

@copilot Great! A couple of things:

Could you change the UI so that there is only a single slider button needed to turn on Run model on CPU only?

Add the CPU only option to the text encoders "T5...

Completed both requests:

Simplified UI: Changed to a single switch toggle (removed the separate enable/disable toggle). The switch now controls both enabling the setting and its value.
Added encoder support: Added cpu_only field to all T5Encoder and Qwen3Encoder config classes, and updated the model loader to respect it.
Fixed critical device mismatch bug: The main issue was that text encoders were using TorchDevice.choose_torch_device() which always returns CUDA, even when the model is on CPU. Fixed by using text_encoder.device instead in:
- compel.py (2 instances)
- sd3_text_encoder.py (2 instances)
- cogview4_text_encoder.py
- z_image_text_encoder.py

This ensures inputs are created on the same device as the model, preventing the RuntimeError you encountered. Commit: 918d21f

lstein · 2026-01-21T13:37:10Z

@copilot Thank you for your work on this. There are still tensor device mismatch errors, but I've also rethought the scope of the work. I think that the CPU-only option should be limited to standalone text encoders and to the text encoder sub-model of Main models. The core issue is that the conditioning tensors returned by a text encoder running on CPU needs to be moved onto the GPU in order to be used as the input to the next phase of the pipeline, usually the denoiser. So this is a special case that needs to be specifically handled. So could you change the code such that when a Main model is marked CPU-only, only the text encoder component is executed on CPU and its conditioning outputs are moved to GPU before passing to the denoiser? Similarly, the standalone encoders should have a CPU-only option in the backend and frontend, and their outputs should be handled similarly.

Let me know if you see any problems with this approach.

…moved to CPU for storage Co-authored-by: lstein <111189+lstein@users.noreply.github.com>

Copilot · 2026-01-21T13:42:18Z

@copilot Thank you for your work on this. There are still tensor device mismatch errors, but I've also rethought the scope of the work. I think that the CPU-only option should be limited to standalone...

Implemented the scope change to limit CPU-only execution to text encoders only:

Backend changes:

Modified _get_execution_device() to check submodel_type - CPU execution only applies to TextEncoder, TextEncoder2, and TextEncoder3 submodels
Denoisers, VAEs, and other submodels always use the default device (GPU)
Added explicit .detach().to("cpu") to all text encoder invocations (SD3, FLUX, CogView4, Z-Image) to move conditioning tensors to CPU for storage
Standalone encoders (T5, Qwen3) keep their direct cpu_only field support

Frontend changes:
Updated popover text to clarify that only text encoders run on CPU, not the entire model.

How it works:

Text encoders run on CPU when enabled (saves VRAM)
Conditioning tensors moved to CPU for storage
When the denoiser needs them, they're automatically moved to GPU
Denoiser always runs on GPU for optimal performance

This approach balances VRAM savings with performance. Commit: e850b78

lstein · 2026-01-21T15:33:53Z

@copilot I'm not sure this is working as expected. I selected the diffusers Z-Image Turbo Main model in the Model Manager tab, and selected "Run model on CPU only." I then started a generation using this model. But prior to denoising, the log shows:

[2026-01-21 10:32:54,838]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '9897a18f-56a8-4b83-a30c-573e8326799d:text_encoder' (Qwen3Model) onto cuda device in 1.06s. Total model size: 7672.25MB, VRAM: 7672.25MB (100.0%)
[2026-01-21 10:32:54,839]::[ModelManagerService]::INFO --> [MODEL CACHE] Loaded model '9897a18f-56a8-4b83-a30c-573e8326799d:tokenizer' (Qwen2Tokenizer) onto cuda device in 0.00s. Total model size: 0.00MB, VRAM: 0.00MB (0.0%)

I don't expect to see the message about the Qwen3Model being loaded into CUDA.

Co-authored-by: lstein <111189+lstein@users.noreply.github.com>

Copilot · 2026-01-21T15:43:16Z

@copilot I'm not sure this is working as expected. I selected the diffusers Z-Image Turbo Main model in the Model Manager tab, and selected "Run model on CPU only." I then started a generation using t...

Fixed the issue. The problem was in the lock() method of the model cache - it was only checking the cache's default _execution_device (CUDA) instead of the model-specific compute_device that was set when adding the model to the cache.

Changes made:

Added compute_device property to both cached model wrapper classes
Modified lock() to check the model's specific compute_device before loading to VRAM
Models configured for CPU now skip VRAM loading entirely
Updated log message to display the model's actual device

Text encoders with cpu_only=True will now correctly load on CPU and log messages will show "onto cpu device" instead of "onto cuda device". This applies to both main model text encoders (SD1.5, SDXL, Z-Image, etc.) and standalone encoders. Commit: 18532b8

lstein

Working as expected. Will correct linting errors after merge with lstein/per-model-cpu-execution branch.

Initial plan

ddd1bd5

Copilot AI assigned Copilot and lstein Jan 21, 2026

Copilot AI mentioned this pull request Jan 21, 2026

[Feature] Per-model CPU execution #46

Open

Copilot started work on behalf of lstein January 21, 2026 05:23 View session

Fix TypeScript linting errors for cpu_only field

55a9cc3

Co-authored-by: lstein <111189+lstein@users.noreply.github.com>

lstein marked this pull request as ready for review January 21, 2026 05:28

lstein self-requested a review as a code owner January 21, 2026 05:28

github-actions bot added the frontend label Jan 21, 2026

chore(frontend) eslint

3ee876a

Copilot AI changed the title ~~[WIP] Add per-model CPU execution configuration~~ Fix TypeScript type definitions for cpu_only field Jan 21, 2026

Copilot finished work on behalf of lstein January 21, 2026 05:32

chore(frontend): prettier

9aa3ae9

Copilot started work on behalf of lstein January 21, 2026 05:36 View session

Add missing popover translation for cpuOnly feature

02766a0

Co-authored-by: lstein <111189+lstein@users.noreply.github.com>

Improve cpuOnly popover help text based on code review

2dadf39

Co-authored-by: lstein <111189+lstein@users.noreply.github.com>

Copilot AI changed the title ~~Fix TypeScript type definitions for cpu_only field~~ Fix frontend types and translations for cpu_only field Jan 21, 2026

Copilot finished work on behalf of lstein January 21, 2026 05:43

Copilot started work on behalf of lstein January 21, 2026 05:51 View session

Simplify CPU-only UI and add encoder support with device mismatch fix

918d21f

Co-authored-by: lstein <111189+lstein@users.noreply.github.com>

Copilot AI changed the title ~~Fix frontend types and translations for cpu_only field~~ [Feature] Per-model CPU execution with encoder support and device fixes Jan 21, 2026

Copilot finished work on behalf of lstein January 21, 2026 06:02

Copilot started work on behalf of lstein January 21, 2026 13:37 View session

Limit CPU-only execution to text encoders and ensure conditioning is …

e850b78

…moved to CPU for storage Co-authored-by: lstein <111189+lstein@users.noreply.github.com>

Copilot AI changed the title ~~[Feature] Per-model CPU execution with encoder support and device fixes~~ [Feature] CPU execution for text encoders with automatic device management Jan 21, 2026

Copilot finished work on behalf of lstein January 21, 2026 13:45

Copilot started work on behalf of lstein January 21, 2026 15:34 View session

Fix CPU-only execution to properly check model-specific compute device

18532b8

Co-authored-by: lstein <111189+lstein@users.noreply.github.com>

Copilot finished work on behalf of lstein January 21, 2026 15:47

Merge branch 'lstein/per-model-cpu-execution' into copilot/sub-pr-46

5f29566

github-actions bot added python invocations backend labels Jan 21, 2026

lstein approved these changes Jan 21, 2026

View reviewed changes

lstein merged commit 7ca2716 into lstein/per-model-cpu-execution Jan 21, 2026
10 of 13 checks passed

lstein deleted the copilot/sub-pr-46 branch January 21, 2026 16:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] CPU execution for text encoders with automatic device management #47

[Feature] CPU execution for text encoders with automatic device management #47

Copilot AI commented Jan 21, 2026 •

edited

Loading

Uh oh!

lstein commented Jan 21, 2026

Uh oh!

Copilot AI commented Jan 21, 2026

Uh oh!

lstein commented Jan 21, 2026

Uh oh!

Copilot AI commented Jan 21, 2026

Uh oh!

lstein commented Jan 21, 2026

Uh oh!

Copilot AI commented Jan 21, 2026

Uh oh!

lstein commented Jan 21, 2026

Uh oh!

Copilot AI commented Jan 21, 2026

Uh oh!

lstein left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Feature] CPU execution for text encoders with automatic device management #47

[Feature] CPU execution for text encoders with automatic device management #47

Conversation

Copilot AI commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issues / Discussions

QA Instructions

Merge Plan

Checklist

Uh oh!

lstein commented Jan 21, 2026

Uh oh!

Copilot AI commented Jan 21, 2026

Uh oh!

lstein commented Jan 21, 2026

Uh oh!

Copilot AI commented Jan 21, 2026

Uh oh!

lstein commented Jan 21, 2026

Uh oh!

Copilot AI commented Jan 21, 2026

Uh oh!

lstein commented Jan 21, 2026

Uh oh!

Copilot AI commented Jan 21, 2026

Uh oh!

lstein left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jan 21, 2026 •

edited

Loading