Skip to content

Conversation

@luohua13
Copy link
Contributor

@luohua13 luohua13 commented Jan 20, 2026

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive "Secure Accelerator Access Conformance Test" guide describing prerequisites, node labeling, setup steps, and two conformance tests that verify denied GPU access when not requested and GPU isolation between pods.
    • Includes automated test execution flow with pre-flight checks, test orchestration, PASS/FAIL reporting, a conformance summary, and guidance for running in air-gapped environments.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 20, 2026

Walkthrough

Adds a new documentation article and accompanying shell script that implement a Kubernetes conformance test for secure GPU (accelerator) access, covering prerequisites, node labeling, two tests (no-GPU access denial and multi-GPU isolation), test execution, and result reporting.

Changes

Cohort / File(s) Summary
Documentation & Test Script
docs/en/solutions/AI/Secure_Accelerator_Access_Conformance_Test.md
New comprehensive guide and included test_secure_accelerator_access.sh script. Specifies prerequisites, GPU node labeling and capacity checks, Test 1 (pod without GPU requests cannot access GPUs), Test 2 (isolation between two pods on a node with ≥2 GPUs via GPU UUID comparison), pre-flight checks, namespace lifecycle and cleanup, air-gapped run guidance, example CLI and sample PASS/FAIL output.

Sequence Diagram(s)

sequenceDiagram
  participant Tester as Tester (runs script)
  participant K8sAPI as kubectl / Kubernetes API
  participant Node as Worker Node (kubelet)
  participant GPU as GPU driver / nvidia-smi

  Tester->>K8sAPI: create namespace & deploy Test1 pod (no GPU)
  K8sAPI->>Node: schedule Test1 pod on labeled GPU node
  Node->>GPU: start container (no GPU requested)
  GPU-->>Node: device list / no devices exposed
  Node-->>K8sAPI: pod logs / status
  K8sAPI-->>Tester: return logs (nvidia-smi output)

  Tester->>K8sAPI: deploy Test2 pods (two pods requesting GPUs)
  K8sAPI->>Node: schedule both pods on same node with ≥2 GPUs
  Node->>GPU: start containers with device plugin allocation
  GPU-->>Node: report device UUIDs for each container
  Node-->>K8sAPI: pod logs / nvidia-smi outputs
  K8sAPI-->>Tester: return logs (GPU UUIDs)
  Tester->>Tester: compare UUIDs → determine PASS/FAIL for isolation
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped in with a script and a test,
Pods and GPUs put to quest,
One denied, two kept apart,
UUIDs reveal each tiny heart,
Conformance met — I nibble a zest! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Secure Accelerator Access Conformance Test' directly and clearly describes the main change: a new documentation article about a Kubernetes-based conformance test for secure GPU access.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@docs/en/solutions/AI/Secure_Accelerator_Access_Conformance_Test.md`:
- Around line 159-171: Replace the brittle kubectl wait call so the script does
not abort under set -e: instead of relying on `kubectl wait` to succeed, query
the pod phase with `kubectl get pod -o jsonpath='{.status.phase}'` for
`no-gpu-pod`, check if the phase equals "Succeeded", and only then parse the
terminated container exit code into `TEST1_EXIT_CODE`; on non-Succeeded phases
fetch and print pod logs and mark the test failed. Apply the same pattern for
the Test 2 wait/log parsing logic so both waits capture phase explicitly and
always run diagnostics even if a wait would have timed out.
- Line 26: Replace the bare URL
"https://docs.alauda.io/pgpu/0.17/install/install.html" with a Markdown link to
satisfy markdownlint MD034; locate the plain URL in
Secure_Accelerator_Access_Conformance_Test.md and change it to the inline link
format like [Alauda PGPU install
guide](https://docs.alauda.io/pgpu/0.17/install/install.html) so the document
remains lint-clean and readable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants