Add trainerv2 with mindspeed#201
Conversation
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThis PR introduces documentation for fine-tuning Qwen3 on Huawei Ascend NPUs using Kubeflow Trainer v2 and MindSpeed-LLM. It adds a comprehensive Jupyter notebook with Kubeflow manifests (TrainingRuntime and TrainJob), environment setup, dataset preparation, checkpoint conversion, and training commands, plus a reference section in the existing tutorial guide. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 56 minutes and 51 seconds.Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynb (1)
292-300: ContainersecurityContextduplicates pod-level fields.
runAsNonRoot,runAsUser, andrunAsGroupare already set on the pod spec at lines 109-113 with the same values, so the container-level copies are redundant — only the truly container-scoped settings (allowPrivilegeEscalation,capabilities,seccompProfile) need to live here. Reduces the chance of the two blocks drifting apart later.🧹 Proposed cleanup
securityContext: allowPrivilegeEscalation: true capabilities: add: ["IPC_LOCK", "SYS_PTRACE"] - runAsNonRoot: true - runAsUser: 1001 - runAsGroup: 0 seccompProfile: type: RuntimeDefault🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynb` around lines 292 - 300, The container-level securityContext duplicates pod-level fields runAsNonRoot, runAsUser, and runAsGroup (same values set earlier); remove those three keys from the container's securityContext block and leave only container-scoped keys (allowPrivilegeEscalation, capabilities, seccompProfile) so the pod-level runAs* settings remain authoritative; locate the container securityContext in the YAML snippet under the container spec (the block containing allowPrivilegeEscalation/capabilities/seccompProfile) and delete runAsNonRoot, runAsUser, and runAsGroup entries there.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynb`:
- Around line 158-163: The YAML block scalar fails because the JSONL and PYCHECK
heredocs are not indented to the required column; keep the printf change for
JSONL and apply the same approach to PYCHECK: either replace the PYCHECK heredoc
with a printf that emits the content (like the JSONL fix) or indent every
non-empty line inside the PYCHECK heredoc by at least 20 spaces to match the
surrounding block scalar (ensure symbols RAW_DATA_FILE, JSONL, and PYCHECK are
updated accordingly and that set -o pipefail block indentation is preserved).
In `@docs/en/kubeflow/how_to/fine-tune-with-trainer-v2.mdx`:
- Line 84: The GitHub URL
"https://github.com/alauda/aml-docs/docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynb"
is malformed and 404s; update that link (and the other two occurrences using the
same pattern) to either a proper GitHub blob URL including the branch (e.g., add
"/blob/main/" between "alauda/aml-docs" and the path) or convert it to a
site-relative link to the notebook within the docs (so it resolves on the
rendered site); search for the same broken pattern on the page (lines near where
the current link appears) and apply the same fix to each occurrence.
---
Nitpick comments:
In `@docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynb`:
- Around line 292-300: The container-level securityContext duplicates pod-level
fields runAsNonRoot, runAsUser, and runAsGroup (same values set earlier); remove
those three keys from the container's securityContext block and leave only
container-scoped keys (allowPrivilegeEscalation, capabilities, seccompProfile)
so the pod-level runAs* settings remain authoritative; locate the container
securityContext in the YAML snippet under the container spec (the block
containing allowPrivilegeEscalation/capabilities/seccompProfile) and delete
runAsNonRoot, runAsUser, and runAsGroup entries there.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 92b4c5fb-fd8a-43c6-b5d3-b92c9aa4f9be
📒 Files selected for processing (2)
docs/en/kubeflow/how_to/fine-tune-with-trainer-v2-mindspeed-npu.ipynbdocs/en/kubeflow/how_to/fine-tune-with-trainer-v2.mdx
Deploying alauda-ai with
|
| Latest commit: |
45b2683
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://06552d9f.alauda-ai.pages.dev |
| Branch Preview URL: | https://add-trainerv2-mindspeed.alauda-ai.pages.dev |
|
/test-pass |
Summary by CodeRabbit