Arm backend: Run adaptive_avg_pool2d before quantization by gggekov · Pull Request #17494 · pytorch/executorch

gggekov · 2026-02-17T18:33:37Z

To run mobilenet_v2 with good performance on Ethos-U55, we need to export the model in channels_last. If we export in channels_first (default behaviour), we pay a hefty performance penalty because the Ethos-U55 hardware is not efficient at doing Transpose
(see details in #17157). The adaptive_avg_pool2d operator, part of mv2,
is traced differently by ExecuTorch
depending on whether it was exported in
channels-first(operator not decomposed) or
exported in channels-last
(operator is decomposed by ExecuTorch in to_edge). To work around that, we add adaptive_avg_pool2d
to the transform_for_annotation
pipeline in order to decompose the
operator before quantization.

cc @freddan80 @per @zingo @oscarandersson8218 @digantdesai

pytorch-bot · 2026-02-17T18:33:41Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17494

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 1 Unrelated Failure

As of commit 62fc849 with merge base f06a1f6 ():

NEW FAILURES - The following jobs have failed:

pull / test-openvino-linux / linux-job (gh)
RuntimeError: Command docker exec -t 5e34f44e29dfd7aa1e4b27fd89dad5c0130f2d1a5ff838d91e4edb6eecec6ed4 /exec failed with exit code 1
pull / test-samsung-models-linux / linux-job (gh)
RuntimeError: Command docker exec -t c7caf26883a2776ad45859232e9d2c77bcae5c834e76008f5b6cdd74ef407da5 /exec failed with exit code 1
pull / test-samsung-quantmodels-linux / linux-job (gh)
RuntimeError: Command docker exec -t 1b75952d22b2c5c249a5279da58d1087057f5db66e68f2b810bc2fa5e2bfeee9 /exec failed with exit code 1
pull / unittest-nxp-neutron / linux-job (gh)
RuntimeError: Command docker exec -t 9a9dd1c53653dc594f6ad53424affcf70fb13fb3c82038aff2148771e2a56e77 /exec failed with exit code 1

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / android / run-emulator (gh) (trunk failure)
The process '/opt/android/sdk/platform-tools/adb' failed with exit code 224

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-02-17T18:34:28Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

This PR optimizes mobilenet_v2 performance on Ethos-U55/U85 hardware by decomposing adaptive_avg_pool2d before quantization and using channels_last memory format to avoid costly transpose operations. The changes address performance issues identified in issue #17157 where ExecuTorch showed higher latency compared to TFLite on MCU hardware.

Changes:

Add DecomposeAdaptiveAvgPool2dPass to transform_for_annotation pipeline to decompose adaptive_avg_pool2d before quantization
Update Ethos-U55/U85 mobilenet_v2 tests to use channels_last memory format
Update test expectations to reflect operator decomposition in quantized pipelines

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
backends/arm/_passes/decompose_adaptive_avg_pool2d_pass.py	Add allowed_to_transform check to respect TFA metadata flags
backends/arm/_passes/arm_pass_manager.py	Add DecomposeAdaptiveAvgPool2dPass to transform_for_annotation pipeline
backends/arm/test/models/test_mobilenet_v2_arm.py	Convert input tensors to channels_last format for Ethos-U55/U85 tests
backends/arm/test/ops/test_mean_dim.py	Update INT pipeline tests to expect decomposed operators (empty op lists)
backends/arm/test/ops/test_avg_pool2d.py	Update test cases and add channels_last adaptive_avg_pool test
backends/arm/test/quantizer/test_selective_quantization.py	Update quantization annotations to reflect operator changes
backends/arm/ethosu/backend.py	Update copyright year to 2025-2026

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

zingo · 2026-02-18T09:42:17Z

I see 4 error
FAILED backends/arm/test/ops/test_ceil.py::test_ceil_tosa_INT[ceil_rand] - AssertionError: Tensor-wise comparison failed: Relative frobenius norm error 0.15430410749477402 exceeds threshold 0.15. (Cosine similarity: 0.9880235195159912, threshold 0.9).
FAILED backends/arm/test/ops/test_conv3d.py::test_convolution_3d_tosa_INT_a8w4[5x5_1x3x9x9_st5_pd0_dl1_needs_adjust_pass,per_channel_quant=True] - AssertionError: Tensor-wise comparison failed: Relative frobenius norm error 0.3268995316163856 exceeds threshold 0.2. (Cosine similarity: 0.9451799988746643, threshold 0.9).
FAILED backends/arm/test/ops/test_conv3d.py::test_convolution_3d_tosa_INT_a8w4[5x5_1x3x9x9_st5_pd0_dl1_needs_adjust_pass,per_channel_quant=False] - AssertionError: Tensor-wise comparison failed: Relative frobenius norm error 0.3040226437176025 exceeds threshold 0.2. (Cosine similarity: 0.9544626474380493, threshold 0.9).
FAILED backends/arm/test/ops/test_le.py::test_le_scalar_tosa_INT[le_scalar_rank2_rand] - AssertionError: Tensor-wise comparison failed: Relative frobenius norm error 0.9999999900000002 exceeds threshold 0.5. (Cosine similarity: 0.7071067690849304, threshold 0.9).
= 4 failed, 4019 passed, 16 skipped, 130 xfailed, 8 xpassed, 28119 warnings in 2455.94s (0:40:55) =

I try to rerun tests to se if it's just some flakeyness

To run mobilenet_v2 with good performance on Ethos-U55, we need to export the model in channels_last. If we export in channels_first (default behaviour), we pay a hefty performance penalty because the Ethos-U55 hardware is not efficient at doing Transpose (see details in pytorch#17157). The adaptive_avg_pool2d operator, part of mv2, is traced differently by ExecuTorch depending on whether it was exported in channels-first(operator not decomposed) or exported in channels-last (operator is decomposed by ExecuTorch in to_edge). To work around that, we add adaptive_avg_pool2d to the transform_for_annotation pipeline in order to decompose the operator before quantization. Signed-off-by: George Gekov <george.gekov@arm.com> Change-Id: I3e98a2d52f6d0e7c79f82188a5e6c4eb6a63448b

gggekov · 2026-02-18T13:14:05Z

I think it is flaky tests as these tests pass now. However, i am getting an error on test-arm-backend-zephyr which i believe is also unrelated to my pr, the error message is

ERROR: Could not find a version that satisfies the requirement torchvision==0.25.0.dev20251222 (
....
ERROR: No matching distribution found for torchvision==0.25.0.dev20251222

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-18T15:50:31Z

backends/arm/test/quantizer/test_selective_quantization.py

    quantization_annotations = {
-        "aten.adaptive_avg_pool2d.default": {
-            None: 1,
+        "aten.conv2d.default": {
+            None: 14,
        },
    }


The test is being changed from verifying selective quantization of adaptive_avg_pool2d to verifying selective quantization of conv2d. This fundamentally changes what the test validates. Since adaptive_avg_pool2d is now decomposed before quantization (due to the new DecomposeAdaptiveAvgPool2dPass in the transform_for_annotation pipeline), it no longer exists as an operation that can be selectively excluded from quantization.

If the goal is still to test selective quantization of pooling operations, consider testing the decomposed operations (avg_pool2d, slice, cat) instead. If the goal has shifted to testing conv2d selective quantization, consider renaming the test to reflect its new purpose, or add a separate test specifically for conv2d while keeping a test for the decomposed pooling operations.

The test name is test_mv3_selective_quant_float32_tosa_INT. Originally, the test was doing selective quantization of adaptive_avg_pool2d.
The aim is to test selective quantization of any operation, not specifically pooling, hence i just use another operation from the mv3 model. I don't think i need to modify the test_mv3_selective_quant_float32_tosa_INT test name ...

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-19T16:00:04Z

backends/arm/test/ops/test_mean_dim.py

        AdaptiveAveragePool2d(),
        test_data(),
-        AdaptiveAveragePool2d.aten_op,
-        AdaptiveAveragePool2d.exir_op,
+        [],
+        [],
        symmetric_io_quantization=True,


These INT pipeline tests now pass empty aten_op/exir_op lists, so the test no longer asserts anything about how adaptive_avg_pool2d is represented after quantization (it becomes a pure numerical correctness check). Since the PR’s goal is to force a pre-quantization decomposition, it would be better to assert on the expected decomposed ops (e.g., aten.slice_copy.Tensor, aten.avg_pool2d.default, aten.cat.default) and/or assert that aten.adaptive_avg_pool2d.default is no longer present in the quantized/exported graph.

Copilot · 2026-02-19T16:00:04Z

backends/arm/test/ops/test_mean_dim.py

        AdaptiveAveragePool2d(),
        test_data(),
-        AdaptiveAveragePool2d.aten_op,
-        AdaptiveAveragePool2d.exir_op,
+        [],
+        [],
        symmetric_io_quantization=True,


These INT Ethos-U tests now skip the exported-graph operator checks by passing empty aten_ops/exir_ops. That makes it easy for regressions in the new pre-quantization decomposition (slice/avg_pool2d/cat) to slip through. Consider updating the expected operator list(s) to match the new decomposed form, or add an explicit check that aten.adaptive_avg_pool2d.default does not appear post-quantization.

Copilot · 2026-02-19T16:00:04Z

backends/arm/test/ops/test_mean_dim.py

        AdaptiveAveragePool2d(),
        test_data(),
-        AdaptiveAveragePool2d.aten_op,
-        AdaptiveAveragePool2d.exir_op,
+        [],
+        [],
        symmetric_io_quantization=True,


Same as above for the U85 INT test: passing empty aten_ops/exir_ops removes validation of the intended decomposition behavior introduced by this PR. Prefer asserting the presence of the decomposed ops (slice_copy/avg_pool2d/cat) or the absence of aten.adaptive_avg_pool2d.default in the quantized/exported graph.

Copilot · 2026-02-19T16:00:04Z

backends/arm/test/ops/test_mean_dim.py

-        AdaptiveAveragePool2d.aten_op,
-        AdaptiveAveragePool2d.exir_op,
+        [],
+        [],


For the VGF quantized test, switching to empty aten_op/exir_op lists means the test no longer validates that adaptive_avg_pool2d is decomposed before quantization. To keep this test meaningful, assert on the expected decomposed ops (or assert aten.adaptive_avg_pool2d.default is absent) in the exported graph after quantization.

Suggested change

[],

AdaptiveAveragePool2d.exir_op,

)" This reverts commit bc469e7.

SS-JIA · 2026-02-20T19:07:57Z

@gggekov unfortunately, had to revert this PR (#17595) due to breaking correctness tests for the avg_pool2d op internally. Happy to help debugging and relanding with a fix!

)" This reverts commit bc469e7.

gggekov · 2026-03-02T17:08:04Z

Hi @SS-JIA ,
Are you sure this was not a a flaky test? The backends/arm/test/ops/test_avg_pool2d.py test_avg_pool2d_16a8w_u85_INT test was passing both in our internal CI as well as in the public CI. How can I reproduce the error ?

gggekov · 2026-03-03T10:47:17Z

Hi @SS-JIA,
Is it possible to make the failing test public ? It feels like a useful test.

rascani · 2026-03-03T19:48:14Z

@gggekov I created a revert (#17831) of @SS-JIA's revert to reland it. I'll import it internally and see if I can get some better signal.

Copilot AI review requested due to automatic review settings February 17, 2026 18:33

gggekov requested a review from digantdesai as a code owner February 17, 2026 18:33

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 17, 2026

gggekov added partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm ciflow/trunk and removed CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. labels Feb 17, 2026

Copilot started reviewing on behalf of gggekov February 17, 2026 18:34 View session

Copilot AI reviewed Feb 17, 2026

View reviewed changes

gggekov mentioned this pull request Feb 17, 2026

MCU hardware benchmark compared to TFLite #17157

Open

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 17, 2026

gggekov force-pushed the fix_mv2_channels_last_export branch from a37a757 to 11a79fc Compare February 18, 2026 12:45

Merge branch 'main' into fix_mv2_channels_last_export

d6c1bd3

Copilot AI review requested due to automatic review settings February 18, 2026 15:44

Copilot started reviewing on behalf of zingo February 18, 2026 15:45 View session

Copilot AI reviewed Feb 18, 2026

View reviewed changes

rascani approved these changes Feb 18, 2026

View reviewed changes

zingo and others added 2 commits February 19, 2026 08:30

Merge branch 'main' into fix_mv2_channels_last_export

5dff476

Merge branch 'main' into fix_mv2_channels_last_export

62fc849

Copilot AI review requested due to automatic review settings February 19, 2026 15:52

Copilot started reviewing on behalf of gggekov February 19, 2026 15:52 View session

Copilot AI reviewed Feb 19, 2026

View reviewed changes

gggekov merged commit bc469e7 into pytorch:main Feb 20, 2026
312 of 317 checks passed

SS-JIA added a commit that referenced this pull request Feb 20, 2026

Revert "Arm backend: Run adaptive_avg_pool2d before quantization (#17494

67d9036

)" This reverts commit bc469e7.

SS-JIA mentioned this pull request Feb 20, 2026

Revert "Arm backend: Run adaptive_avg_pool2d before quantization" #17595

Merged

SS-JIA added a commit that referenced this pull request Feb 20, 2026

Revert "Arm backend: Run adaptive_avg_pool2d before quantization (#17494

2e35799

)" This reverts commit bc469e7.

rascani mentioned this pull request Mar 3, 2026

Revert "Revert "Arm backend: Run adaptive_avg_pool2d before quantization"" #17831

Open

Conversation

gggekov commented Feb 17, 2026 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17494

❌ 4 New Failures, 1 Unrelated Failure

Uh oh!

github-actions bot commented Feb 17, 2026

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

zingo commented Feb 18, 2026

Uh oh!

gggekov commented Feb 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

gggekov Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SS-JIA commented Feb 20, 2026

Uh oh!

gggekov commented Mar 2, 2026

Uh oh!

gggekov commented Mar 3, 2026

Uh oh!

rascani commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gggekov commented Feb 17, 2026 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Feb 17, 2026 •

edited

Loading

This PR needs a `release notes:` label