Skip to content

Revert "Arm backend: Run adaptive_avg_pool2d before quantization"#17595

Merged
SS-JIA merged 1 commit intomainfrom
revert-17494-fix_mv2_channels_last_export
Feb 20, 2026
Merged

Revert "Arm backend: Run adaptive_avg_pool2d before quantization"#17595
SS-JIA merged 1 commit intomainfrom
revert-17494-fix_mv2_channels_last_export

Conversation

@SS-JIA
Copy link
Contributor

@SS-JIA SS-JIA commented Feb 20, 2026

Reverts #17494

This PR broke some Meta internal tests (correctness failure).

buck test @fbcode//mode/dev fbcode//executorch/backends/arm/test:avg_pool2d -- --exact 'fbcode//executorch/backends/arm/test:avg_pool2d - test_avg_pool2d.py::test_avg_pool2d_16a8w_u85_INT[channels_last_adaptive_avg_pool]'

Failure logs:

Test was added in target revision (1c60f84b2c8e8e306480b8ca0417d783951163e2) hence is not present in base rev (437693276acb10f1839317ae91b3f43e7e4cceaf)
============================= test session starts ==============================
platform linux -- Python 3.12.12+meta, pytest-7.2.2, pluggy-1.6.0
rootdir: /data/sandcastle/boxes/trunk-hg-full-fbsource/fbcode/executorch/backends/arm/test, configfile: pytest.ini
plugins: hypothesis-6.151.6
collected 218 items

executorch/backends/arm/test  ARM_TEST_SEED=0  
Network summary for out
Accelerator configuration               Ethos_U85_128
System configuration             Ethos_U85_SYS_DRAM_Mid
Memory mode                               Shared_Sram
Accelerator clock                                1000 MHz
Design peak SRAM bandwidth                      29.80 GB/s

Total SRAM used                                125.00 KiB

CPU operators = 0 (0.0%)
NPU operators = 3 (100.0%)

Average SRAM bandwidth                           4.62 GB/s
Input   SRAM bandwidth                           0.06 MB/batch
Weight  SRAM bandwidth                           0.00 MB/batch
Output  SRAM bandwidth                           0.00 MB/batch
Total   SRAM bandwidth                           0.07 MB/batch
Total   SRAM bandwidth            per input      0.07 MB/inference (batch size 1)

Neural network macs                             65280 MACs/batch

F

=================================== FAILURES ===================================
________ test_avg_pool2d_16a8w_u85_INT[channels_last_adaptive_avg_pool] ________

test_module = <function <lambda> at 0x7fa621a154e0>

    [@common](https://www.internalfb.com/intern/profile/common).parametrize("test_module", test_modules)
    [@common](https://www.internalfb.com/intern/profile/common).XfailIfNoCorstone320
    def test_avg_pool2d_16a8w_u85_INT(test_module):
        """Test avg_pool2d with 16A8W quantization on U85 (16-bit activations, 8-bit weights)"""
        model, input_tensor = test_module()
        pipeline = EthosU85PipelineINT[input_t](
            model,
            input_tensor,
            aten_op,
            exir_op,
            per_channel_quantization=False,
            a16w8_quantization=True,
            use_to_edge_transform_and_lower=True,
        )
>       pipeline.run()

../buck-out/v2/art/fbcode/1b2358bdbaa340e3/executorch/backends/arm/test/__avg_pool2d__/avg_pool2d#link-tree/executorch/backends/arm/test/ops/test_avg_pool2d.py:254: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../buck-out/v2/art/fbcode/1b2358bdbaa340e3/executorch/backends/arm/test/__avg_pool2d__/avg_pool2d#link-tree/executorch/backends/arm/test/tester/test_pipeline.py:342: in run
    raise e
../buck-out/v2/art/fbcode/1b2358bdbaa340e3/executorch/backends/arm/test/__avg_pool2d__/avg_pool2d#link-tree/executorch/backends/arm/test/tester/test_pipeline.py:339: in run
    stage()
../buck-out/v2/art/fbcode/1b2358bdbaa340e3/executorch/backends/arm/test/__avg_pool2d__/avg_pool2d#link-tree/executorch/backends/arm/test/tester/test_pipeline.py:76: in __call__
    self.func(*self.args, **self.kwargs)
../buck-out/v2/art/fbcode/1b2358bdbaa340e3/executorch/backends/arm/test/__avg_pool2d__/avg_pool2d#link-tree/executorch/backends/arm/test/tester/arm_tester.py:544: in run_method_and_compare_outputs
    self._compare_outputs(
../buck-out/v2/art/fbcode/1b2358bdbaa340e3/executorch/backends/arm/test/__avg_pool2d__/avg_pool2d#link-tree/executorch/backends/arm/test/tester/arm_tester.py:988: in _compare_outputs
    raise e
../buck-out/v2/art/fbcode/1b2358bdbaa340e3/executorch/backends/arm/test/__avg_pool2d__/avg_pool2d#link-tree/executorch/backends/arm/test/tester/arm_tester.py:963: in _compare_outputs
    super()._compare_outputs(
../buck-out/v2/art/fbcode/1b2358bdbaa340e3/executorch/backends/arm/test/__avg_pool2d__/avg_pool2d#link-tree/executorch/backends/test/harness/tester.py:435: in _compare_outputs
    Tester._assert_outputs_equal(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

model_output = (tensor([[[[-0.0547]],

         [[ 0.0417]],

         [[ 0.1699]],

         ...,

         [[ 0.2009]],

         [[ 0.1804]],

         [[-0.1309]]]]),)
ref_output = (tensor([[[[ 0.0117]],

         [[ 0.1504]],

         [[-0.0754]],

         ...,

         [[-0.0723]],

         [[-0.1296]],

         [[ 0.2195]]]]),)
atol = 0.001244140625, rtol = 0.001, statistics_callback = None

    [@staticmethod](https://www.internalfb.com/intern/profile/staticmethod)
    def _assert_outputs_equal(
        model_output,
        ref_output,
        atol=1e-03,
        rtol=1e-03,
        statistics_callback: Callable[[ErrorStatistics], None] | None = None,
    ):
        """
        Helper testing function that asserts that the model output and the reference output
        are equal with some tolerance. Due to numerical differences between eager mode and
        the XNNPACK's backend, we relax the detal such that absolute tolerance is 1e-3. and
        relative tolerance is 1e-3. In the event that the computation was quantized, we
        further relax the tolerance to one quantized step (equal to the quantization scale).
        This allows the quantized value to differ by 1 between the reference and model output.
        """
    
        assert len(model_output) == len(ref_output)
    
        for i in range(len(model_output)):
            model = model_output[i]
            ref = ref_output[i]
    
            error_stats = ErrorStatistics.from_tensors(model, ref)
            if statistics_callback is not None:
                statistics_callback(error_stats)
    
            assert (
                ref.shape == model.shape
            ), f"Output {i} shape {model.shape} does not match reference output shape {ref.shape}"
            if model.dtype == torch.bool:
                assert torch.equal(model, ref), (
                    f"Output {i} (bool tensor) does not match reference output.
"
                    f"\tShape: {model.shape}
"
                    f"\tMismatched count: {(model != ref).sum().item()} / {model.numel()}
"
                )
            else:
>               assert torch.allclose(
                    model,
                    ref,
                    atol=atol,
                    rtol=rtol,
                    equal_nan=True,
                ), (
                    f"Output {i} does not match reference output.
"
                    f"\tGiven atol: {atol}, rtol: {rtol}.
"
                    f"\tOutput tensor shape: {model.shape}, dtype: {model.dtype}
"
                    f"\tDifference: max: {torch.max(model-ref)}, abs: {torch.max(torch.abs(model-ref))}, mean abs error: {torch.mean(torch.abs(model-ref).to(torch.double))}.
"
                    f"\t-- Model vs. Reference --
"
                    f"\t Numel: {model.numel()}, {ref.numel()}
"
                    f"\tMedian: {model.median()}, {ref.median()}
"
                    f"\t  Mean: {model.to(torch.double).mean()}, {ref.to(torch.double).mean()}
"
                    f"\t   Max: {model.max()}, {ref.max()}
"
                    f"\t   Min: {model.min()}, {ref.min()}
"
                )
E               AssertionError: Output 0 does not match reference output.
E               	Given atol: 0.001244140625, rtol: 0.001.
E               	Output tensor shape: torch.Size([1, 1280, 1, 1]), dtype: torch.float32
E               	Difference: max: 0.772216796875, abs: 0.772216796875, mean abs error: 0.16089305877685547.
E               	-- Model vs. Reference --
E               	 Numel: 1280, 1280
E               	Median: -0.003173828125, -0.001708984375
E               	  Mean: -0.0039030075073242187, -0.003900909423828125
E               	   Max: 0.4833984375, 0.436767578125
E               	   Min: -0.4248046875, -0.43994140625

../buck-out/v2/art/fbcode/1b2358bdbaa340e3/executorch/backends/arm/test/__avg_pool2d__/avg_pool2d#link-tree/executorch/backends/test/harness/tester.py:388: AssertionError
------------------------------ Captured log call -------------------------------
WARNING  root:_program.py:1089 Op aten.silu_.default was requested for preservation by partitioner.  This request is ignored because it is mutable.
WARNING  root:_program.py:1089 Op aten.silu_.default was requested for preservation by partitioner.  This request is ignored because it is mutable.
WARNING  root:_program.py:1089 Op aten.silu_.default was requested for preservation by partitioner.  This request is ignored because it is mutable.
ERROR    executorch.backends.arm.test.tester.analyze_output_utils:analyze_output_utils.py:292 
############################ ERROR DIFFERENCE #############################
BATCH 0
channel 0 (e-01)
[ 0.66 ]
channel 1 (e-01)
[ 1.09 ]
channel 2 (e-01)
[-2.45 ]
channel 3 (e-01)
[-0.27 ]
channel 4 (e-01)
[-0.61 ]
channel 5 (e-01)
[-0.54 ]
channel 6 (e-01)
[-2.95 ]
channel 7 (e-01)
[ 3.65 ]
channel 8 (e-01)
[ 0.95 ]
channel 9 (e-01)
[ 1.16 ]
channel 10 (e-01)
[-0.44 ]
channel 11 (e-01)
[-1.31 ]
channel 12 (e-01)
[ 1.78 ]
channel 13 (e-01)
[ 1.89 ]
channel 14 (e-01)
[-1.42 ]
channel 15 (e-01)
[ 2.88 ]
channel 16 (e-01)
[-1.76 ]
channel 17 (e-01)
[-0.25 ]
channel 18 (e-01)
[ 0.62 ]
channel 19 (e-01)
[-1.69 ]
channel 20 (e-01)
[ 0.44 ]
channel 21 (e-01)
[-1.11 ]
channel 22 (e-01)
[-2.67 ]
channel 23 (e-01)
[-1.34 ]
channel 24 (e-01)
[ 6.27 ]
channel 25 (e-01)
[ 2.39 ]
channel 26 (e-01)
[ 1.96 ]
channel 27 (e-01)
[ 1.24 ]
channel 28 (e-01)
[ 0.65 ]
channel 29 (e-01)
[ 1.92 ]
channel 30 (e-01)
[ 0.04 ]
channel 31 (e-01)
[-0.86 ]
channel 32 (e-01)
[-0.51 ]
channel 33 (e-01)
[ 0.27 ]
channel 34 (e-01)
[ 1.10 ]
channel 35 (e-01)
[-1.36 ]
channel 36 (e-01)
[ 0.95 ]
channel 37 (e-01)
[-1.36 ]
chan
...
...
...
use_tmpdir_env_var as stale.
V0220 07:23:40.836646 125915 FastKnob.cpp:462] Marked fbpkg/cli:vcs_snapshot_fallible as stale.
V0220 07:23:40.836648 125915 FastKnob.cpp:462] Marked fbpkg/cli:vcs_snapshot_lifetime_days as stale.
V0220 07:23:40.836650 125915 FastKnob.cpp:462] Marked fbpkg/cli:vcs_snapshot_max_file_count as stale.
V0220 07:23:40.836653 125915 FastKnob.cpp:462] Marked fbpkg/cli:vcs_snapshot_max_untracked_file_size_mb as stale.
V0220 07:23:40.836655 125915 FastKnob.cpp:462] Marked fbpkg/cli:vcs_snapshot_timeout_seconds as stale.
V0220 07:23:40.836658 125915 FastKnob.cpp:462] Marked movefast/knobs:a_deleted_knob as stale.
V0220 07:23:40.836660 125915 FastKnob.cpp:462] Marked movefast/knobs:allow_knob_flag_overrides as stale.
V0220 07:23:40.836663 125915 FastKnob.cpp:462] Marked movefast/knobs:allow_request_manager_handle_error_in_gen_gk_param as stale.
V0220 07:23:40.836668 125915 FastKnob.cpp:462] Marked movefast/knobs:allow_zozo as stale.
V0220 07:23:40.836671 125915 FastKnob.cpp:462] Marked movefast/knobs:benchmark_killswitch as stale.
V0220 07:23:40.836673 125915 FastKnob.cpp:462] Marked movefast/knobs:benchmark_rollout as stale.
V0220 07:23:40.836675 125915 FastKnob.cpp:462] Marked movefast/knobs:benchmark_switches as stale.
V0220 07:23:40.836677 125915 FastKnob.cpp:462] Marked movefast/knobs:benchmark_value as stale.
V0220 07:23:40.836680 125915 FastKnob.cpp:462] Marked movefast/knobs:block_ui as stale.
V0220 07:23:40.836682 125915 FastKnob.cpp:462] Marked movefast/knobs:check_a_deleted_knob as stale.
V0220 07:23:40.836684 125915 FastKnob.cpp:462] Marked movefast/knobs:default_distillery_dir_fbcode as stale.
V0220 07:23:40.836687 125915 FastKnob.cpp:462] Marked movefast/knobs:defer_initial as stale.
V0220 07:23:40.836689 125915 FastKnob.cpp:462] Marked movefast/knobs:enable_jk_diff_creation_by_service_account as stale.
V0220 07:23:40.836692 125915 FastKnob.cpp:462] Marked movefast/knobs:enable_knob_watcher as stale.
V0220 07:23:40.836694 125915 FastKnob.cpp:462] Marked movefast/knobs:enable_pass_auto_canary_req_in_update as stale.
V0220 07:23:40.836697 125915 FastKnob.cpp:462] Marked movefast/knobs:enable_sampling_rate_graphql as stale.
V0220 07:23:40.836699 125915 FastKnob.cpp:462] Marked movefast/knobs:enable_updated_switch_timestamp_update as stale.
V0220 07:23:40.836705 125915 FastKnob.cpp:462] Marked movefast/knobs:feature_values as stale.
V0220 07:23:40.836708 125915 FastKnob.cpp:462] Marked movefast/knobs:features as stale.
V0220 07:23:40.836710 125915 FastKnob.cpp:462] Marked movefast/knobs:gen_prepare_nowait_in_graphql as stale.
V0220 07:23:40.836712 125915 FastKnob.cpp:462] Marked movefast/knobs:read_from_configo_in_graphql as stale.
V0220 07:23:40.836715 125915 FastKnob.cpp:462] Marked movefast/knobs:run_risk_analyzer_for_all_justknobs_lands as stale.
V0220 07:23:40.836717 125915 FastKnob.cpp:462] Marked movefast/knobs:use_justknobs as stale.
V0220 07:23:40.836719 125915 FastKnob.cpp:462] Marked movefast/knobs:wait_for_mutation_in_graphql as stale.
V0220 07:23:40.947823 125915 InstrumentReader.cpp:70] [ODS3 SDK] ODS3 InstrumentReader: Destroyed

test_module = <function <lambda> at 0x7fa621a154e0>

    [@common](https://www.internalfb.com/intern/profile/common).parametrize("test_module", test_modules)
    [@common](https://www.internalfb.com/intern/profile/common).XfailIfNoCorstone320
    def test_avg_pool2d_16a8w_u85_INT(test_module):
        """Test avg_pool2d with 16A8W quantization on U85 (16-bit activations, 8-bit weights)"""
        model, input_tensor = test_module()
        pipeline = EthosU85PipelineINT[input_t](
            model,
            input_tensor,
            aten_op,
            exir_op,
            per_channel_quantization=False,
            a16w8_quantization=True,
            use_to_edge_transform_and_lower=True,
        )
>       pipeline.run()

../buck-out/v2/art/fbcode/1b2358bdbaa340e3/executorch/backends/arm/test/__avg_pool2d__/avg_pool2d#link-tree/executorch/backends/arm/test/ops/test_avg_pool2d.py:254: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../buck-out/v2/art/fbcode/1b2358bdbaa340e3/executorch/backends/arm/test/__avg_pool2d__/avg_pool2d#link-tree/executorch/backends/arm/test/tester/test_pipeline.py:342: in run
    raise e
../buck-out/v2/art/fbcode/1b2358bdbaa340e3/executorch/backends/arm/test/__avg_pool2d__/avg_pool2d#link-tree/executorch/backends/arm/test/tester/test_pipeline.py:339: in run
    stage()
../buck-out/v2/art/fbcode/1b2358bdbaa340e3/executorch/backends/arm/test/__avg_pool2d__/avg_pool2d#link-tree/executorch/backends/arm/test/tester/test_pipeline.py:76: in __call__
    self.func(*self.args, **self.kwargs)
../buck-out/v2/art/fbcode/1b2358bdbaa340e3/executorch/backends/arm/test/__avg_pool2d__/avg_pool2d#link-tree/executorch/backends/arm/test/tester/arm_tester.py:544: in run_method_and_compare_outputs
    self._compare_outputs(
../buck-out/v2/art/fbcode/1b2358bdbaa340e3/executorch/backends/arm/test/__avg_pool2d__/avg_pool2d#link-tree/executorch/backends/arm/test/tester/arm_tester.py:988: in _compare_outputs
    raise e
../buck-out/v2/art/fbcode/1b2358bdbaa340e3/executorch/backends/arm/test/__avg_pool2d__/avg_pool2d#link-tree/executorch/backends/arm/test/tester/arm_tester.py:963: in _compare_outputs
    super()._compare_outputs(
../buck-out/v2/art/fbcode/1b2358bdbaa340e3/executorch/backends/arm/test/__avg_pool2d__/avg_pool2d#link-tree/executorch/backends/test/harness/tester.py:435: in _compare_outputs
    Tester._assert_outputs_equal(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

model_output = (tensor([[[[-0.0547]],

         [[ 0.0417]],

         [[ 0.1699]],

         ...,

         [[ 0.2009]],

         [[ 0.1804]],

         [[-0.1309]]]]),)
ref_output = (tensor([[[[ 0.0117]],

         [[ 0.1504]],

         [[-0.0754]],

         ...,

         [[-0.0723]],

         [[-0.1296]],

         [[ 0.2195]]]]),)
atol = 0.001244140625, rtol = 0.001, statistics_callback = None

    [@staticmethod](https://www.internalfb.com/intern/profile/staticmethod)
    def _assert_outputs_equal(
        model_output,
        ref_output,
        atol=1e-03,
        rtol=1e-03,
        statistics_callback: Callable[[ErrorStatistics], None] | None = None,
    ):
        """
        Helper testing function that asserts that the model output and the reference output
        are equal with some tolerance. Due to numerical differences between eager mode and
        the XNNPACK's backend, we relax the detal such that absolute tolerance is 1e-3. and
        relative tolerance is 1e-3. In the event that the computation was quantized, we
        further relax the tolerance to one quantized step (equal to the quantization scale).
        This allows the quantized value to differ by 1 between the reference and model output.
        """
    
        assert len(model_output) == len(ref_output)
    
        for i in range(len(model_output)):
            model = model_output[i]
            ref = ref_output[i]
    
            error_stats = ErrorStatistics.from_tensors(model, ref)
            if statistics_callback is not None:
                statistics_callback(error_stats)
    
            assert (
                ref.shape == model.shape
            ), f"Output {i} shape {model.shape} does not match reference output shape {ref.shape}"
            if model.dtype == torch.bool:
                assert torch.equal(model, ref), (
                    f"Output {i} (bool tensor) does not match reference output.
"
                    f"\tShape: {model.shape}
"
                    f"\tMismatched count: {(model != ref).sum().item()} / {model.numel()}
"
                )
            else:
>               assert torch.allclose(
                    model,
                    ref,
                    atol=atol,
                    rtol=rtol,
                    equal_nan=True,
                ), (
                    f"Output {i} does not match reference output.
"
                    f"\tGiven atol: {atol}, rtol: {rtol}.
"
                    f"\tOutput tensor shape: {model.shape}, dtype: {model.dtype}
"
                    f"\tDifference: max: {torch.max(model-ref)}, abs: {torch.max(torch.abs(model-ref))}, mean abs error: {torch.mean(torch.abs(model-ref).to(torch.double))}.
"
                    f"\t-- Model vs. Reference --
"
                    f"\t Numel: {model.numel()}, {ref.numel()}
"
                    f"\tMedian: {model.median()}, {ref.median()}
"
                    f"\t  Mean: {model.to(torch.double).mean()}, {ref.to(torch.double).mean()}
"
                    f"\t   Max: {model.max()}, {ref.max()}
"
                    f"\t   Min: {model.min()}, {ref.min()}
"
                )
E               AssertionError: Output 0 does not match reference output.
E               	Given atol: 0.001244140625, rtol: 0.001.
E               	Output tensor shape: torch.Size([1, 1280, 1, 1]), dtype: torch.float32
E               	Difference: max: 0.772216796875, abs: 0.772216796875, mean abs error: 0.16089305877685547.
E               	-- Model vs. Reference --
E               	 Numel: 1280, 1280
E               	Median: -0.003173828125, -0.001708984375
E               	  Mean: -0.0039030075073242187, -0.003900909423828125
E               	   Max: 0.4833984375, 0.436767578125
E               	   Min: -0.4248046875, -0.43994140625

../buck-out/v2/art/fbcode/1b2358bdbaa340e3/executorch/backends/arm/test/__avg_pool2d__/avg_pool2d#link-tree/executorch/backends/test/harness/tester.py:388: AssertionError

cc: @gggekov

@pytorch-bot
Copy link

pytorch-bot bot commented Feb 20, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17595

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 14 Pending, 1 Unrelated Failure

As of commit 67d9036 with merge base 9591a67 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 20, 2026
@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@SS-JIA SS-JIA merged commit 2e35799 into main Feb 20, 2026
168 of 176 checks passed
@SS-JIA SS-JIA deleted the revert-17494-fix_mv2_channels_last_export branch February 20, 2026 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants