-
Notifications
You must be signed in to change notification settings - Fork 14
Secure Accelerator Access Conformance Test #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughAdds a new documentation article and accompanying shell script that implement a Kubernetes conformance test for secure GPU (accelerator) access, covering prerequisites, node labeling, two tests (no-GPU access denial and multi-GPU isolation), test execution, and result reporting. Changes
Sequence Diagram(s)sequenceDiagram
participant Tester as Tester (runs script)
participant K8sAPI as kubectl / Kubernetes API
participant Node as Worker Node (kubelet)
participant GPU as GPU driver / nvidia-smi
Tester->>K8sAPI: create namespace & deploy Test1 pod (no GPU)
K8sAPI->>Node: schedule Test1 pod on labeled GPU node
Node->>GPU: start container (no GPU requested)
GPU-->>Node: device list / no devices exposed
Node-->>K8sAPI: pod logs / status
K8sAPI-->>Tester: return logs (nvidia-smi output)
Tester->>K8sAPI: deploy Test2 pods (two pods requesting GPUs)
K8sAPI->>Node: schedule both pods on same node with ≥2 GPUs
Node->>GPU: start containers with device plugin allocation
GPU-->>Node: report device UUIDs for each container
Node-->>K8sAPI: pod logs / nvidia-smi outputs
K8sAPI-->>Tester: return logs (GPU UUIDs)
Tester->>Tester: compare UUIDs → determine PASS/FAIL for isolation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@docs/en/solutions/AI/Secure_Accelerator_Access_Conformance_Test.md`:
- Around line 159-171: Replace the brittle kubectl wait call so the script does
not abort under set -e: instead of relying on `kubectl wait` to succeed, query
the pod phase with `kubectl get pod -o jsonpath='{.status.phase}'` for
`no-gpu-pod`, check if the phase equals "Succeeded", and only then parse the
terminated container exit code into `TEST1_EXIT_CODE`; on non-Succeeded phases
fetch and print pod logs and mark the test failed. Apply the same pattern for
the Test 2 wait/log parsing logic so both waits capture phase explicitly and
always run diagnostics even if a wait would have timed out.
- Line 26: Replace the bare URL
"https://docs.alauda.io/pgpu/0.17/install/install.html" with a Markdown link to
satisfy markdownlint MD034; locate the plain URL in
Secure_Accelerator_Access_Conformance_Test.md and change it to the inline link
format like [Alauda PGPU install
guide](https://docs.alauda.io/pgpu/0.17/install/install.html) so the document
remains lint-clean and readable.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.