Support MIG UUID and respect CUDA_VISIBLE_DEVICES in nnUNetV2Runner #8716

ytl0623 · 2026-01-22T06:10:18Z

Description

This PR fixes two critical issues when running nnUNetV2Runner on NVIDIA MIG (Multi-Instance GPU) environments or when using a specific CUDA_VISIBLE_DEVICES configuration.

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
In-line docstrings updated.
Documentation updated, tested make html command in the docs/ folder.

Signed-off-by: ytl0623 <david89062388@gmail.com>

coderabbitai · 2026-01-22T06:10:42Z

Warning

Rate limit exceeded

@ytl0623 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 6 minutes and 31 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

Replaces inline command construction with a typed builder train_single_model_command(self, config: str, fold: int, gpu_id: int | str | tuple | list, kwargs: dict[str, Any]) -> str that returns the full shell command. Adds explicit GPU handling: accept int/str for single GPU, tuple/list for multi-GPU (validated non-empty), construct CUDA_VISIBLE_DEVICES appropriately, compute num_gpus, and append CLI options including --npz when requested. train_single_model calls this builder. predict signature broadened to gpu_id: int | str and preserves an existing CUDA_VISIBLE_DEVICES when appropriate; docstrings updated.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly addresses the main change: supporting MIG UUID and respecting CUDA_VISIBLE_DEVICES in nnUNetV2Runner, matching the primary objectives.
Description check	✅ Passed	The description identifies the linked issue (`#7497`), explains the two critical issues being fixed, and follows the template structure with appropriate checkboxes.
Linked Issues check	✅ Passed	Code changes align with `#7497` requirements: gpu_id now accepts MIG UUIDs (int \| str \| tuple \| list typing), CUDA_VISIBLE_DEVICES is preserved when already set, and new validation handles GPU configuration properly.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to resolving `#7497`: only train_single_model_command and predict methods modified for GPU handling; no unrelated alterations present.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@monai/apps/nnunet/nnunetv2_runner.py`:
- Around line 827-830: The current check uses gpu_id == 0 (int) so a string "0"
won’t match and will overwrite CUDA_VISIBLE_DEVICES; update the logic around
gpu_id, os.environ and logger.info in nnunetv2_runner.py to compare string
forms: check if "CUDA_VISIBLE_DEVICES" in os.environ and str(gpu_id) ==
os.environ.get("CUDA_VISIBLE_DEVICES") then log and do not overwrite, otherwise
set os.environ["CUDA_VISIBLE_DEVICES"] = str(gpu_id); refer to the gpu_id
variable, logger.info call, and the CUDA_VISIBLE_DEVICES env handling
(consistent with train_single_model_command behavior).

🧹 Nitpick comments (1)

monai/apps/nnunet/nnunetv2_runner.py (1)
791-791: Docstring should reflect new behavior.

The gpu_id parameter now accepts MIG UUIDs (strings) and respects existing CUDA_VISIBLE_DEVICES. Update the docstring at line 825 to document this.
Suggested docstring
-            gpu_id: which GPU to use for prediction.
+            gpu_id: GPU device index (int) or MIG UUID (str) for prediction.
+                If CUDA_VISIBLE_DEVICES is already set and gpu_id is 0, the existing
+                environment variable is preserved.
Also applies to: 825-825

monai/apps/nnunet/nnunetv2_runner.py

Signed-off-by: ytl0623 <david89062388@gmail.com>

for more information, see https://pre-commit.ci

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

monai/apps/nnunet/nnunetv2_runner.py (1)

531-565: Missing docstring and type hints.

Per coding guidelines, all definitions require Google-style docstrings documenting parameters and return values. This method also lacks type annotations.

Proposed signature and docstring

-    def train_single_model_command(self, config, fold, gpu_id, kwargs):
+    def train_single_model_command(
+        self, config: str, fold: int, gpu_id: int | str | tuple | list, kwargs: dict[str, Any]
+    ) -> str:
+        """
+        Build the shell command string for training a single nnU-Net model.
+
+        Args:
+            config: Configuration name (e.g., "3d_fullres").
+            fold: Cross-validation fold index (0-4).
+            gpu_id: Device selector—int, str (MIG UUID), or tuple/list for multi-GPU.
+            kwargs: Additional CLI arguments forwarded to nnUNetv2_train.
+
+        Returns:
+            Shell command string.
+        """

🤖 Fix all issues with AI agents

In `@monai/apps/nnunet/nnunetv2_runner.py`:
- Around line 537-544: The code path for gpu_id being a tuple/list doesn't
handle empty sequences, leaving device_setting and num_gpus in unintended
states; update the branch that checks isinstance(gpu_id, (tuple, list)) to
explicitly handle len(gpu_id) == 0 (e.g., raise a ValueError or TypeError with a
clear message including the received gpu_id), ensuring you reference and set
device_setting and num_gpus deterministically (or abort) instead of falling
through silently; keep the existing branches for len>1 and len==1 unchanged.

🧹 Nitpick comments (1)

monai/apps/nnunet/nnunetv2_runner.py (1)
553-557: Leading whitespace when device_setting is empty.

When the existing env var is reused, device_setting = "" produces " nnUNetv2_train ...".
Trim the prefix
-        cmd = (
-            f"{device_setting} nnUNetv2_train "
+        prefix = f"{device_setting} " if device_setting else ""
+        cmd = (
+            f"{prefix}nnUNetv2_train "

monai/apps/nnunet/nnunetv2_runner.py

1. added docstring and type hints 2. lead whitespace when device_setting is empty 3. empty tuple/list falls through silently Signed-off-by: ytl0623 <david89062388@gmail.com>

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@monai/apps/nnunet/nnunetv2_runner.py`:
- Around line 531-533: Add direct unit tests for the new
train_single_model_command method and the updated predict signature: write tests
that call train_single_model_command with representative gpu_id types (int, str,
tuple/list) and sample kwargs and assert the returned command string contains
the expected flags/arguments and correctly interpolates values; add unit tests
for predict that call the method with the new signature parameters (including
optional/edge cases) and assert it accepts the new parameters, invokes the
expected internal behavior (mocking subprocess/IO as needed) and returns/raises
the expected results; focus on exercising argument parsing, kwarg handling, and
error branches, and use mocking for external side effects so tests remain pure
unit tests.

🧹 Nitpick comments (1)

monai/apps/nnunet/nnunetv2_runner.py (1)
534-545: Docstring missing Raises section.

Per coding guidelines, document the ValueError raised when gpu_id is an empty tuple/list.
Suggested docstring addition
         Returns:
             Shell command string.
+
+        Raises:
+            ValueError: If gpu_id is an empty tuple or list.
         """

coderabbitai · 2026-01-22T06:39:58Z

monai/apps/nnunet/nnunetv2_runner.py

+    def train_single_model_command(
+        self, config: str, fold: int, gpu_id: int | str | tuple | list, kwargs: dict[str, Any]
+    ) -> str:


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Find test files related to nnunet fd -t f "test.*nnunet" --extension py fd -t f "nnunet.*test" --extension py

Repository: Project-MONAI/MONAI

Length of output: 168

🏁 Script executed:

# Search for test coverage of train_single_model_command rg -r "train_single_model_command" --glob "*.py" --glob "*test*"

Repository: Project-MONAI/MONAI

Length of output: 459

🏁 Script executed:

# Check the actual nnunet test directory structure find . -type d -name "*test*nnunet*" -o -type d -name "*nnunet*test*" 2>/dev/null | head -20

Repository: Project-MONAI/MONAI

Length of output: 45

🏁 Script executed:

# Look for nnunetv2_runner tests specifically fd -t f "nnunetv2_runner" --extension py

Repository: Project-MONAI/MONAI

Length of output: 153

🏁 Script executed:

# Check test file size and contents wc -l tests/integration/test_integration_nnunetv2_runner.py

Repository: Project-MONAI/MONAI

Length of output: 119

🏁 Script executed:

# Search for test functions and method names in the test file rg "def test_|train_single_model_command|predict" tests/integration/test_integration_nnunetv2_runner.py -n

Repository: Project-MONAI/MONAI

Length of output: 285

🏁 Script executed:

# Read the full test file to understand coverage cat -n tests/integration/test_integration_nnunetv2_runner.py

Repository: Project-MONAI/MONAI

Length of output: 4991

🏁 Script executed:

# Check the nnunetv2_runner.py file to see the new/modified methods wc -l monai/apps/nnunet/nnunetv2_runner.py

Repository: Project-MONAI/MONAI

Length of output: 103

🏁 Script executed:

# Find the train_single_model_command method rg -n "def train_single_model_command" monai/apps/nnunet/nnunetv2_runner.py -A 5

Repository: Project-MONAI/MONAI

Length of output: 320

🏁 Script executed:

# Find the predict method signature rg -n "def predict" monai/apps/nnunet/nnunetv2_runner.py -A 5

Repository: Project-MONAI/MONAI

Length of output: 555

🏁 Script executed:

# Check if train_single_model_command is used internally by other methods rg -n "train_single_model_command" monai/apps/nnunet/nnunetv2_runner.py

Repository: Project-MONAI/MONAI

Length of output: 281

🏁 Script executed:

# Check what the predict method does and if it's called by predict_ensemble_postprocessing rg -n "def predict_ensemble_postprocessing" monai/apps/nnunet/nnunetv2_runner.py -A 30

Repository: Project-MONAI/MONAI

Length of output: 1491

🏁 Script executed:

# Check run() method to see what it calls rg -n "def run" monai/apps/nnunet/nnunetv2_runner.py -A 15

Repository: Project-MONAI/MONAI

Length of output: 816

Add direct unit tests for new/modified methods.

Integration test exists but lacks direct unit test coverage for train_single_model_command (new) and modified predict signature. Add tests for these methods per coding guidelines.

🤖 Prompt for AI Agents

In `@monai/apps/nnunet/nnunetv2_runner.py` around lines 531 - 533, Add direct unit tests for the new train_single_model_command method and the updated predict signature: write tests that call train_single_model_command with representative gpu_id types (int, str, tuple/list) and sample kwargs and assert the returned command string contains the expected flags/arguments and correctly interpolates values; add unit tests for predict that call the method with the new signature parameters (including optional/edge cases) and assert it accepts the new parameters, invokes the expected internal behavior (mocking subprocess/IO as needed) and returns/raises the expected results; focus on exercising argument parsing, kwarg handling, and error branches, and use mocking for external side effects so tests remain pure unit tests.

Should we add direct unit tests??

@ytl0623 Yes. The coding guidelines require new or modified definitions to be covered by unit tests. Your changes introduce significant logic for GPU device selection, MIG UUID handling, and environment variable management—each with multiple edge cases.

Direct unit tests would verify:

train_single_model_command: different gpu_id types (int, str, tuple, list), empty tuple validation, CUDA_VISIBLE_DEVICES preservation logic, command string construction

predict: new gpu_id signature (int | str), environment variable preservation when gpu_id is 0

If you'd like, I can help plan the unit tests to add.

Signed-off-by: ytl0623 <david89062388@gmail.com>

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@monai/apps/nnunet/nnunetv2_runner.py`:
- Around line 528-529: train_single_model_command currently builds and returns a
shell string that is passed to run_cmd (which expects list[str]) and then
executed with shell=True risking injection; change train_single_model_command to
build and return a list[str] (argv-style) instead of an f-string, move any
dynamic settings (e.g. kwargs, fold, trainer_class_name, config) into either
explicit CLI args in that list or into an env dict, then call run_cmd(cmd_list,
env=env_dict) without shell=True; update the caller in nnunetv2_runner.py to
pass the returned list and env to run_cmd and remove shell=True, and add a unit
test for train_single_model_command that asserts the returned value is a list of
strings with correct argument ordering and that environment variables are
returned/handled as expected (validate env contains expected keys like trainer
class / kwargs).

♻️ Duplicate comments (1)

monai/apps/nnunet/nnunetv2_runner.py (1)

531-533: Add unit coverage for the new command builder and GPU handling.
Please add direct tests for train_single_model_command (int/str/tuple/list, MIG UUIDs, env preservation) and predict (gpu_id str/int paths). As per coding guidelines, ...

monai/apps/nnunet/nnunetv2_runner.py

Signed-off-by: ytl0623 <david89062388@gmail.com>

Support MIG UUID and respect CUDA_VISIBLE_DEVICES in nnUNetV2Runner

d8a65ba

Signed-off-by: ytl0623 <david89062388@gmail.com>

ytl0623 requested review from KumoLiu, Nic-Ma and ericspod as code owners January 22, 2026 06:10

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

monai/apps/nnunet/nnunetv2_runner.py Outdated Show resolved Hide resolved

ytl0623 and others added 2 commits January 22, 2026 14:22

fix bug: string 0 bypasses env var preservation

834c57a

Signed-off-by: ytl0623 <david89062388@gmail.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

fe71b7b

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

monai/apps/nnunet/nnunetv2_runner.py Show resolved Hide resolved

minor fixes:

eddddca

1. added docstring and type hints 2. lead whitespace when device_setting is empty 3. empty tuple/list falls through silently Signed-off-by: ytl0623 <david89062388@gmail.com>

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

add docstring to the Raises section

d053eb4

Signed-off-by: ytl0623 <david89062388@gmail.com>

ytl0623 force-pushed the fix-issue-7497 branch from 8c79b9e to d053eb4 Compare January 22, 2026 07:16

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

monai/apps/nnunet/nnunetv2_runner.py Outdated Show resolved Hide resolved

refactor nnUNet runner for security, MIG support, and type safety

d038d2c

Signed-off-by: ytl0623 <david89062388@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support MIG UUID and respect CUDA_VISIBLE_DEVICES in nnUNetV2Runner #8716

Support MIG UUID and respect CUDA_VISIBLE_DEVICES in nnUNetV2Runner #8716

Uh oh!

ytl0623 commented Jan 22, 2026

Uh oh!

coderabbitai bot commented Jan 22, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 22, 2026 •

edited

Loading

Uh oh!

ytl0623 Jan 22, 2026

Uh oh!

coderabbitai bot Jan 22, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Support MIG UUID and respect CUDA_VISIBLE_DEVICES in nnUNetV2Runner #8716

Are you sure you want to change the base?

Support MIG UUID and respect CUDA_VISIBLE_DEVICES in nnUNetV2Runner #8716

Uh oh!

Conversation

ytl0623 commented Jan 22, 2026

Description

Types of changes

Uh oh!

coderabbitai bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ytl0623 Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Jan 22, 2026 •

edited

Loading

coderabbitai bot Jan 22, 2026 •

edited

Loading