Arm backend: Run adaptive_avg_pool2d before quantization#17494
Arm backend: Run adaptive_avg_pool2d before quantization#17494gggekov merged 4 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17494
Note: Links to docs will display an error until the docs builds have been completed. ❌ 4 New Failures, 1 Unrelated FailureAs of commit 62fc849 with merge base f06a1f6 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
This PR optimizes mobilenet_v2 performance on Ethos-U55/U85 hardware by decomposing adaptive_avg_pool2d before quantization and using channels_last memory format to avoid costly transpose operations. The changes address performance issues identified in issue #17157 where ExecuTorch showed higher latency compared to TFLite on MCU hardware.
Changes:
- Add DecomposeAdaptiveAvgPool2dPass to transform_for_annotation pipeline to decompose adaptive_avg_pool2d before quantization
- Update Ethos-U55/U85 mobilenet_v2 tests to use channels_last memory format
- Update test expectations to reflect operator decomposition in quantized pipelines
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| backends/arm/_passes/decompose_adaptive_avg_pool2d_pass.py | Add allowed_to_transform check to respect TFA metadata flags |
| backends/arm/_passes/arm_pass_manager.py | Add DecomposeAdaptiveAvgPool2dPass to transform_for_annotation pipeline |
| backends/arm/test/models/test_mobilenet_v2_arm.py | Convert input tensors to channels_last format for Ethos-U55/U85 tests |
| backends/arm/test/ops/test_mean_dim.py | Update INT pipeline tests to expect decomposed operators (empty op lists) |
| backends/arm/test/ops/test_avg_pool2d.py | Update test cases and add channels_last adaptive_avg_pool test |
| backends/arm/test/quantizer/test_selective_quantization.py | Update quantization annotations to reflect operator changes |
| backends/arm/ethosu/backend.py | Update copyright year to 2025-2026 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I see 4 error I try to rerun tests to se if it's just some flakeyness |
To run mobilenet_v2 with good performance on Ethos-U55, we need to export the model in channels_last. If we export in channels_first (default behaviour), we pay a hefty performance penalty because the Ethos-U55 hardware is not efficient at doing Transpose (see details in pytorch#17157). The adaptive_avg_pool2d operator, part of mv2, is traced differently by ExecuTorch depending on whether it was exported in channels-first(operator not decomposed) or exported in channels-last (operator is decomposed by ExecuTorch in to_edge). To work around that, we add adaptive_avg_pool2d to the transform_for_annotation pipeline in order to decompose the operator before quantization. Signed-off-by: George Gekov <george.gekov@arm.com> Change-Id: I3e98a2d52f6d0e7c79f82188a5e6c4eb6a63448b
a37a757 to
11a79fc
Compare
|
I think it is flaky tests as these tests pass now. However, i am getting an error on |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| quantization_annotations = { | ||
| "aten.adaptive_avg_pool2d.default": { | ||
| None: 1, | ||
| "aten.conv2d.default": { | ||
| None: 14, | ||
| }, | ||
| } |
There was a problem hiding this comment.
The test is being changed from verifying selective quantization of adaptive_avg_pool2d to verifying selective quantization of conv2d. This fundamentally changes what the test validates. Since adaptive_avg_pool2d is now decomposed before quantization (due to the new DecomposeAdaptiveAvgPool2dPass in the transform_for_annotation pipeline), it no longer exists as an operation that can be selectively excluded from quantization.
If the goal is still to test selective quantization of pooling operations, consider testing the decomposed operations (avg_pool2d, slice, cat) instead. If the goal has shifted to testing conv2d selective quantization, consider renaming the test to reflect its new purpose, or add a separate test specifically for conv2d while keeping a test for the decomposed pooling operations.
There was a problem hiding this comment.
The test name is test_mv3_selective_quant_float32_tosa_INT. Originally, the test was doing selective quantization of adaptive_avg_pool2d.
The aim is to test selective quantization of any operation, not specifically pooling, hence i just use another operation from the mv3 model. I don't think i need to modify the test_mv3_selective_quant_float32_tosa_INT test name ...
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| AdaptiveAveragePool2d(), | ||
| test_data(), | ||
| AdaptiveAveragePool2d.aten_op, | ||
| AdaptiveAveragePool2d.exir_op, | ||
| [], | ||
| [], | ||
| symmetric_io_quantization=True, |
There was a problem hiding this comment.
These INT pipeline tests now pass empty aten_op/exir_op lists, so the test no longer asserts anything about how adaptive_avg_pool2d is represented after quantization (it becomes a pure numerical correctness check). Since the PR’s goal is to force a pre-quantization decomposition, it would be better to assert on the expected decomposed ops (e.g., aten.slice_copy.Tensor, aten.avg_pool2d.default, aten.cat.default) and/or assert that aten.adaptive_avg_pool2d.default is no longer present in the quantized/exported graph.
| AdaptiveAveragePool2d(), | ||
| test_data(), | ||
| AdaptiveAveragePool2d.aten_op, | ||
| AdaptiveAveragePool2d.exir_op, | ||
| [], | ||
| [], | ||
| symmetric_io_quantization=True, |
There was a problem hiding this comment.
These INT Ethos-U tests now skip the exported-graph operator checks by passing empty aten_ops/exir_ops. That makes it easy for regressions in the new pre-quantization decomposition (slice/avg_pool2d/cat) to slip through. Consider updating the expected operator list(s) to match the new decomposed form, or add an explicit check that aten.adaptive_avg_pool2d.default does not appear post-quantization.
| AdaptiveAveragePool2d(), | ||
| test_data(), | ||
| AdaptiveAveragePool2d.aten_op, | ||
| AdaptiveAveragePool2d.exir_op, | ||
| [], | ||
| [], | ||
| symmetric_io_quantization=True, |
There was a problem hiding this comment.
Same as above for the U85 INT test: passing empty aten_ops/exir_ops removes validation of the intended decomposition behavior introduced by this PR. Prefer asserting the presence of the decomposed ops (slice_copy/avg_pool2d/cat) or the absence of aten.adaptive_avg_pool2d.default in the quantized/exported graph.
| AdaptiveAveragePool2d.aten_op, | ||
| AdaptiveAveragePool2d.exir_op, | ||
| [], | ||
| [], |
There was a problem hiding this comment.
For the VGF quantized test, switching to empty aten_op/exir_op lists means the test no longer validates that adaptive_avg_pool2d is decomposed before quantization. To keep this test meaningful, assert on the expected decomposed ops (or assert aten.adaptive_avg_pool2d.default is absent) in the exported graph after quantization.
| [], | |
| AdaptiveAveragePool2d.exir_op, |
)" This reverts commit bc469e7.
)" This reverts commit bc469e7.
|
Hi @SS-JIA , |
|
Hi @SS-JIA, |
To run mobilenet_v2 with good performance on Ethos-U55, we need to export the model in channels_last. If we export in channels_first (default behaviour), we pay a hefty performance penalty because the Ethos-U55 hardware is not efficient at doing Transpose
(see details in #17157). The adaptive_avg_pool2d operator, part of mv2,
is traced differently by ExecuTorch
depending on whether it was exported in
channels-first(operator not decomposed) or
exported in channels-last
(operator is decomposed by ExecuTorch in to_edge). To work around that, we add adaptive_avg_pool2d
to the transform_for_annotation
pipeline in order to decompose the
operator before quantization.
cc @freddan80 @per @zingo @oscarandersson8218 @digantdesai