Skip to content

Arm backend: Run adaptive_avg_pool2d before quantization#17494

Merged
gggekov merged 4 commits intopytorch:mainfrom
gggekov:fix_mv2_channels_last_export
Feb 20, 2026
Merged

Arm backend: Run adaptive_avg_pool2d before quantization#17494
gggekov merged 4 commits intopytorch:mainfrom
gggekov:fix_mv2_channels_last_export

Conversation

@gggekov
Copy link
Collaborator

@gggekov gggekov commented Feb 17, 2026

To run mobilenet_v2 with good performance on Ethos-U55, we need to export the model in channels_last. If we export in channels_first (default behaviour), we pay a hefty performance penalty because the Ethos-U55 hardware is not efficient at doing Transpose
(see details in #17157). The adaptive_avg_pool2d operator, part of mv2,
is traced differently by ExecuTorch
depending on whether it was exported in
channels-first(operator not decomposed) or
exported in channels-last
(operator is decomposed by ExecuTorch in to_edge). To work around that, we add adaptive_avg_pool2d
to the transform_for_annotation
pipeline in order to decompose the
operator before quantization.

cc @freddan80 @per @zingo @oscarandersson8218 @digantdesai

Copilot AI review requested due to automatic review settings February 17, 2026 18:33
@gggekov gggekov requested a review from digantdesai as a code owner February 17, 2026 18:33
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 17, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17494

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 1 Unrelated Failure

As of commit 62fc849 with merge base f06a1f6 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 17, 2026
@gggekov gggekov added partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm ciflow/trunk and removed CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. labels Feb 17, 2026
@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes mobilenet_v2 performance on Ethos-U55/U85 hardware by decomposing adaptive_avg_pool2d before quantization and using channels_last memory format to avoid costly transpose operations. The changes address performance issues identified in issue #17157 where ExecuTorch showed higher latency compared to TFLite on MCU hardware.

Changes:

  • Add DecomposeAdaptiveAvgPool2dPass to transform_for_annotation pipeline to decompose adaptive_avg_pool2d before quantization
  • Update Ethos-U55/U85 mobilenet_v2 tests to use channels_last memory format
  • Update test expectations to reflect operator decomposition in quantized pipelines

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
backends/arm/_passes/decompose_adaptive_avg_pool2d_pass.py Add allowed_to_transform check to respect TFA metadata flags
backends/arm/_passes/arm_pass_manager.py Add DecomposeAdaptiveAvgPool2dPass to transform_for_annotation pipeline
backends/arm/test/models/test_mobilenet_v2_arm.py Convert input tensors to channels_last format for Ethos-U55/U85 tests
backends/arm/test/ops/test_mean_dim.py Update INT pipeline tests to expect decomposed operators (empty op lists)
backends/arm/test/ops/test_avg_pool2d.py Update test cases and add channels_last adaptive_avg_pool test
backends/arm/test/quantizer/test_selective_quantization.py Update quantization annotations to reflect operator changes
backends/arm/ethosu/backend.py Update copyright year to 2025-2026

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 17, 2026
@zingo
Copy link
Collaborator

zingo commented Feb 18, 2026

I see 4 error
FAILED backends/arm/test/ops/test_ceil.py::test_ceil_tosa_INT[ceil_rand] - AssertionError: Tensor-wise comparison failed: Relative frobenius norm error 0.15430410749477402 exceeds threshold 0.15. (Cosine similarity: 0.9880235195159912, threshold 0.9).
FAILED backends/arm/test/ops/test_conv3d.py::test_convolution_3d_tosa_INT_a8w4[5x5_1x3x9x9_st5_pd0_dl1_needs_adjust_pass,per_channel_quant=True] - AssertionError: Tensor-wise comparison failed: Relative frobenius norm error 0.3268995316163856 exceeds threshold 0.2. (Cosine similarity: 0.9451799988746643, threshold 0.9).
FAILED backends/arm/test/ops/test_conv3d.py::test_convolution_3d_tosa_INT_a8w4[5x5_1x3x9x9_st5_pd0_dl1_needs_adjust_pass,per_channel_quant=False] - AssertionError: Tensor-wise comparison failed: Relative frobenius norm error 0.3040226437176025 exceeds threshold 0.2. (Cosine similarity: 0.9544626474380493, threshold 0.9).
FAILED backends/arm/test/ops/test_le.py::test_le_scalar_tosa_INT[le_scalar_rank2_rand] - AssertionError: Tensor-wise comparison failed: Relative frobenius norm error 0.9999999900000002 exceeds threshold 0.5. (Cosine similarity: 0.7071067690849304, threshold 0.9).
= 4 failed, 4019 passed, 16 skipped, 130 xfailed, 8 xpassed, 28119 warnings in 2455.94s (0:40:55) =

I try to rerun tests to se if it's just some flakeyness

To run mobilenet_v2 with good performance on Ethos-U55, we
need to export the model in channels_last. If we export in
channels_first (default behaviour), we pay a hefty
performance penalty because the Ethos-U55 hardware is not
efficient at doing Transpose
(see details in pytorch#17157).
The adaptive_avg_pool2d operator, part of mv2,
is traced differently by ExecuTorch
depending  on whether it was exported in
channels-first(operator not decomposed) or
exported in channels-last
(operator is decomposed by ExecuTorch in to_edge).
To work around that, we add adaptive_avg_pool2d
to the transform_for_annotation
pipeline in order to decompose the
operator before quantization.

Signed-off-by: George Gekov <george.gekov@arm.com>
Change-Id: I3e98a2d52f6d0e7c79f82188a5e6c4eb6a63448b
@gggekov gggekov force-pushed the fix_mv2_channels_last_export branch from a37a757 to 11a79fc Compare February 18, 2026 12:45
@gggekov
Copy link
Collaborator Author

gggekov commented Feb 18, 2026

I think it is flaky tests as these tests pass now. However, i am getting an error on test-arm-backend-zephyr which i believe is also unrelated to my pr, the error message is

ERROR: Could not find a version that satisfies the requirement torchvision==0.25.0.dev20251222 (
....
ERROR: No matching distribution found for torchvision==0.25.0.dev20251222

Copilot AI review requested due to automatic review settings February 18, 2026 15:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 174 to 178
quantization_annotations = {
"aten.adaptive_avg_pool2d.default": {
None: 1,
"aten.conv2d.default": {
None: 14,
},
}
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is being changed from verifying selective quantization of adaptive_avg_pool2d to verifying selective quantization of conv2d. This fundamentally changes what the test validates. Since adaptive_avg_pool2d is now decomposed before quantization (due to the new DecomposeAdaptiveAvgPool2dPass in the transform_for_annotation pipeline), it no longer exists as an operation that can be selectively excluded from quantization.

If the goal is still to test selective quantization of pooling operations, consider testing the decomposed operations (avg_pool2d, slice, cat) instead. If the goal has shifted to testing conv2d selective quantization, consider renaming the test to reflect its new purpose, or add a separate test specifically for conv2d while keeping a test for the decomposed pooling operations.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test name is test_mv3_selective_quant_float32_tosa_INT. Originally, the test was doing selective quantization of adaptive_avg_pool2d.
The aim is to test selective quantization of any operation, not specifically pooling, hence i just use another operation from the mv3 model. I don't think i need to modify the test_mv3_selective_quant_float32_tosa_INT test name ...

Copilot AI review requested due to automatic review settings February 19, 2026 15:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 54 to 58
AdaptiveAveragePool2d(),
test_data(),
AdaptiveAveragePool2d.aten_op,
AdaptiveAveragePool2d.exir_op,
[],
[],
symmetric_io_quantization=True,
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These INT pipeline tests now pass empty aten_op/exir_op lists, so the test no longer asserts anything about how adaptive_avg_pool2d is represented after quantization (it becomes a pure numerical correctness check). Since the PR’s goal is to force a pre-quantization decomposition, it would be better to assert on the expected decomposed ops (e.g., aten.slice_copy.Tensor, aten.avg_pool2d.default, aten.cat.default) and/or assert that aten.adaptive_avg_pool2d.default is no longer present in the quantized/exported graph.

Copilot uses AI. Check for mistakes.
Comment on lines 66 to 70
AdaptiveAveragePool2d(),
test_data(),
AdaptiveAveragePool2d.aten_op,
AdaptiveAveragePool2d.exir_op,
[],
[],
symmetric_io_quantization=True,
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These INT Ethos-U tests now skip the exported-graph operator checks by passing empty aten_ops/exir_ops. That makes it easy for regressions in the new pre-quantization decomposition (slice/avg_pool2d/cat) to slip through. Consider updating the expected operator list(s) to match the new decomposed form, or add an explicit check that aten.adaptive_avg_pool2d.default does not appear post-quantization.

Copilot uses AI. Check for mistakes.
Comment on lines 78 to 82
AdaptiveAveragePool2d(),
test_data(),
AdaptiveAveragePool2d.aten_op,
AdaptiveAveragePool2d.exir_op,
[],
[],
symmetric_io_quantization=True,
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above for the U85 INT test: passing empty aten_ops/exir_ops removes validation of the intended decomposition behavior introduced by this PR. Prefer asserting the presence of the decomposed ops (slice_copy/avg_pool2d/cat) or the absence of aten.adaptive_avg_pool2d.default in the quantized/exported graph.

Copilot uses AI. Check for mistakes.
AdaptiveAveragePool2d.aten_op,
AdaptiveAveragePool2d.exir_op,
[],
[],
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the VGF quantized test, switching to empty aten_op/exir_op lists means the test no longer validates that adaptive_avg_pool2d is decomposed before quantization. To keep this test meaningful, assert on the expected decomposed ops (or assert aten.adaptive_avg_pool2d.default is absent) in the exported graph after quantization.

Suggested change
[],
AdaptiveAveragePool2d.exir_op,

Copilot uses AI. Check for mistakes.
@gggekov gggekov merged commit bc469e7 into pytorch:main Feb 20, 2026
312 of 317 checks passed
SS-JIA added a commit that referenced this pull request Feb 20, 2026
@SS-JIA
Copy link
Contributor

SS-JIA commented Feb 20, 2026

@gggekov unfortunately, had to revert this PR (#17595) due to breaking correctness tests for the avg_pool2d op internally. Happy to help debugging and relanding with a fix!

SS-JIA added a commit that referenced this pull request Feb 20, 2026
@gggekov
Copy link
Collaborator Author

gggekov commented Mar 2, 2026

Hi @SS-JIA ,
Are you sure this was not a a flaky test? The backends/arm/test/ops/test_avg_pool2d.py test_avg_pool2d_16a8w_u85_INT test was passing both in our internal CI as well as in the public CI. How can I reproduce the error ?

@gggekov
Copy link
Collaborator Author

gggekov commented Mar 3, 2026

Hi @SS-JIA,
Is it possible to make the failing test public ? It feels like a useful test.

@rascani
Copy link
Contributor

rascani commented Mar 3, 2026

@gggekov I created a revert (#17831) of @SS-JIA's revert to reland it. I'll import it internally and see if I can get some better signal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants