Conversation
Signed-off-by: Kinjal Patel <kinjalpravin@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis change modifies the quantization plugin to implement a resilient quantization path for GptOssExperts by routing matmul and bmm operations through explicit ATen implementations instead of standard torch dispatchers. This avoids potential Python dispatch recursion issues and adds an explicit override for the torch.Tensor.matmul functional. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #999 +/- ##
==========================================
- Coverage 72.12% 72.11% -0.02%
==========================================
Files 209 209
Lines 23628 23638 +10
==========================================
+ Hits 17042 17046 +4
- Misses 6586 6592 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What does this PR do?
This PR fixes maximum recursion bug for GPT-OSS. It replaces
torch._bmmandtorch.matmulwithtorch.ops.aten.bmmandtorch.ops.aten.matmulto avoid recursionUsage
Testing
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md:N/AAdditional Information
Summary by CodeRabbit