Skip to content

fix(ci): Fix Sample Application E2E test flakiness#5755

Draft
antonis wants to merge 7 commits intomainfrom
antonis/sample-e2e-flakiness-fix
Draft

fix(ci): Fix Sample Application E2E test flakiness#5755
antonis wants to merge 7 commits intomainfrom
antonis/sample-e2e-flakiness-fix

Conversation

@antonis
Copy link
Contributor

@antonis antonis commented Mar 4, 2026

Summary

  • Increases MAESTRO_DRIVER_STARTUP_TIMEOUT from 90s to 180s for slow Cirrus Labs Tart VMs
  • Adds wait_for_boot: true and erase_before_boot: false to simulator-action to ensure the simulator is fully ready before tests
  • Adds a simulator warm-up step (launch/terminate Settings app) before running iOS tests
  • Fixes captureSpaceflightNewsScreenTransaction test by sorting envelopes by timestamp instead of relying on arrival order (which is non-deterministic on slow VMs)
  • Relaxes HTTP spans assertion from exactly 2 to at least 1, since not all HTTP tracing layers may complete within the transaction on slow VMs
  • Fixes captureErrorsScreenTransaction Android test by searching all envelopes for the app start transaction instead of only the first matching envelope (on slow emulators, it may arrive separately)

#skip-changelog

Test plan

  • Verify Sample Application / Test ios production REV2 passes on CI
  • Verify Sample Application / Test android production REV2 passes on CI

🤖 Generated with Claude Code

…ners

- Increase MAESTRO_DRIVER_STARTUP_TIMEOUT to 180s for slow Tart VMs
- Add wait_for_boot and erase_before_boot: false to simulator-action
- Add simulator warm-up step before running iOS tests
- Sort spaceflight news envelopes by timestamp instead of arrival order
- Relax HTTP spans assertion to >= 1 (not all layers complete on slow VMs)
- Search all envelopes for app start transaction (may arrive separately)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

Semver Impact of This PR

None (no version bump detected)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).


This PR will not appear in the changelog.


🤖 This preview updates automatically when you update the PR.

@antonis antonis added the ready-to-merge Triggers the full CI test suite label Mar 4, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

iOS (legacy) Performance metrics 🚀

  Plain With Sentry Diff
Startup time 1214.93 ms 1215.85 ms 0.92 ms
Size 3.38 MiB 4.79 MiB 1.41 MiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
ea3e26e+dirty 1229.13 ms 1228.46 ms -0.67 ms
80e4616+dirty 1221.32 ms 1225.64 ms 4.32 ms
818a608+dirty 1205.76 ms 1208.00 ms 2.24 ms
77061ed+dirty 1233.16 ms 1234.88 ms 1.71 ms
bef3709+dirty 1222.07 ms 1220.24 ms -1.83 ms
a206511+dirty 1185.00 ms 1186.35 ms 1.35 ms
74979ac+dirty 1210.49 ms 1213.31 ms 2.82 ms
a2bb688+dirty 1223.53 ms 1232.90 ms 9.37 ms
8a868fe+dirty 1221.50 ms 1230.78 ms 9.28 ms
d590428+dirty 1211.77 ms 1220.51 ms 8.75 ms

App size

Revision Plain With Sentry Diff
ea3e26e+dirty 3.41 MiB 4.58 MiB 1.17 MiB
80e4616+dirty 3.38 MiB 4.60 MiB 1.22 MiB
818a608+dirty 2.63 MiB 3.91 MiB 1.28 MiB
77061ed+dirty 2.63 MiB 3.98 MiB 1.34 MiB
bef3709+dirty 3.38 MiB 4.78 MiB 1.40 MiB
a206511+dirty 3.41 MiB 4.67 MiB 1.25 MiB
74979ac+dirty 3.38 MiB 4.60 MiB 1.22 MiB
a2bb688+dirty 2.63 MiB 3.99 MiB 1.36 MiB
8a868fe+dirty 3.38 MiB 4.60 MiB 1.22 MiB
d590428+dirty 3.38 MiB 4.78 MiB 1.39 MiB

@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

Android (legacy) Performance metrics 🚀

  Plain With Sentry Diff
Startup time 450.63 ms 459.24 ms 8.61 ms
Size 43.75 MiB 48.48 MiB 4.73 MiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
86584b7+dirty 463.83 ms 500.31 ms 36.48 ms
9a81842+dirty 412.23 ms 416.56 ms 4.33 ms
c637fc7+dirty 433.70 ms 467.76 ms 34.06 ms
d73150f+dirty 411.21 ms 465.86 ms 54.65 ms
fa7bb7e+dirty 350.37 ms 377.02 ms 26.65 ms
3bd3f0d+dirty 447.21 ms 472.31 ms 25.10 ms
88890fe+dirty 350.94 ms 365.74 ms 14.80 ms
95aaf8a 437.89 ms 419.45 ms -18.44 ms
c0842e7+dirty 527.76 ms 566.69 ms 38.93 ms
1e7a472+dirty 348.80 ms 362.55 ms 13.75 ms

App size

Revision Plain With Sentry Diff
86584b7+dirty 43.75 MiB 48.08 MiB 4.33 MiB
9a81842+dirty 43.75 MiB 48.08 MiB 4.33 MiB
c637fc7+dirty 43.75 MiB 48.40 MiB 4.64 MiB
d73150f+dirty 43.75 MiB 48.55 MiB 4.80 MiB
fa7bb7e+dirty 17.75 MiB 19.75 MiB 2.00 MiB
3bd3f0d+dirty 17.75 MiB 19.70 MiB 1.95 MiB
88890fe+dirty 17.75 MiB 19.71 MiB 1.96 MiB
95aaf8a 17.75 MiB 19.68 MiB 1.93 MiB
c0842e7+dirty 43.75 MiB 48.41 MiB 4.66 MiB
1e7a472+dirty 17.75 MiB 19.70 MiB 1.96 MiB

@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

Android (new) Performance metrics 🚀

  Plain With Sentry Diff
Startup time 357.52 ms 408.18 ms 50.66 ms
Size 43.94 MiB 49.35 MiB 5.41 MiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
7480abe+dirty 363.80 ms 431.34 ms 67.54 ms
2b89ce9+dirty 372.22 ms 417.06 ms 44.84 ms
170d5ea+dirty 348.79 ms 406.94 ms 58.15 ms
b1579bc+dirty 391.87 ms 456.26 ms 64.39 ms
73f2455+dirty 369.33 ms 398.90 ms 29.57 ms
0b64753+dirty 358.55 ms 429.16 ms 70.61 ms
6a70a7e+dirty 382.45 ms 424.54 ms 42.09 ms
2adbd1e+dirty 366.13 ms 419.49 ms 53.36 ms
f8d19f8+dirty 374.17 ms 383.40 ms 9.23 ms
7be1f99+dirty 369.02 ms 399.60 ms 30.58 ms

App size

Revision Plain With Sentry Diff
7480abe+dirty 7.15 MiB 8.41 MiB 1.26 MiB
2b89ce9+dirty 7.15 MiB 8.41 MiB 1.26 MiB
170d5ea+dirty 7.15 MiB 8.42 MiB 1.27 MiB
b1579bc+dirty 43.94 MiB 49.27 MiB 5.33 MiB
73f2455+dirty 43.94 MiB 48.82 MiB 4.88 MiB
0b64753+dirty 7.15 MiB 8.42 MiB 1.27 MiB
6a70a7e+dirty 7.15 MiB 8.42 MiB 1.26 MiB
2adbd1e+dirty 7.15 MiB 8.43 MiB 1.28 MiB
f8d19f8+dirty 43.94 MiB 48.91 MiB 4.97 MiB
7be1f99+dirty 7.15 MiB 8.42 MiB 1.27 MiB

@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

iOS (new) Performance metrics 🚀

  Plain With Sentry Diff
Startup time 1205.79 ms 1207.96 ms 2.17 ms
Size 3.38 MiB 4.79 MiB 1.41 MiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
ea3e26e+dirty 1216.61 ms 1214.15 ms -2.47 ms
80e4616+dirty 1206.90 ms 1205.94 ms -0.96 ms
818a608+dirty 1218.84 ms 1223.18 ms 4.34 ms
77061ed+dirty 1210.77 ms 1218.45 ms 7.68 ms
bef3709+dirty 1217.79 ms 1225.33 ms 7.54 ms
a206511+dirty 1225.02 ms 1223.74 ms -1.28 ms
74979ac+dirty 1212.33 ms 1212.54 ms 0.21 ms
a2bb688+dirty 1244.82 ms 1238.60 ms -6.22 ms
8a868fe+dirty 1206.85 ms 1215.04 ms 8.19 ms
d590428+dirty 1221.23 ms 1225.27 ms 4.03 ms

App size

Revision Plain With Sentry Diff
ea3e26e+dirty 3.41 MiB 4.58 MiB 1.17 MiB
80e4616+dirty 3.38 MiB 4.60 MiB 1.22 MiB
818a608+dirty 3.19 MiB 4.48 MiB 1.29 MiB
77061ed+dirty 3.19 MiB 4.54 MiB 1.36 MiB
bef3709+dirty 3.38 MiB 4.78 MiB 1.40 MiB
a206511+dirty 3.41 MiB 4.67 MiB 1.25 MiB
74979ac+dirty 3.38 MiB 4.60 MiB 1.22 MiB
a2bb688+dirty 3.19 MiB 4.56 MiB 1.37 MiB
8a868fe+dirty 3.38 MiB 4.60 MiB 1.22 MiB
d590428+dirty 3.38 MiB 4.78 MiB 1.39 MiB

antonis and others added 3 commits March 4, 2026 09:45
On slow Cirrus Labs Tart VMs, the app may crash during Maestro flow
execution. Add up to 3 retries to handle transient app crashes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
App start transactions (origin: auto.app.start) have app_start_cold
measurements but not time_to_initial_display/time_to_full_display.
The filter already excluded ui.action.touch but not app start transactions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@antonis antonis marked this pull request as ready for review March 4, 2026 10:35
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

}
}
}
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry mechanism causes envelope contamination across attempts

Medium Severity

The new maestro retry mechanism runs multiple attempts while the sentry mock server keeps accumulating envelopes from all attempts. If a failed attempt sends partial envelopes before crashing, getAllEnvelopes returns envelopes from both failed and successful runs. This is particularly problematic for takeSecond whose closure counter spans all attempts — it can resolve waitForEnvelope prematurely. After retry, newsEnvelopes may contain extra stale envelopes, causing index-based access (newsEnvelopes[0]) to reference data from a crashed run.

Additional Locations (1)

Fix in Cursor Fix in Web

antonis and others added 3 commits March 4, 2026 11:43
- Use nullish coalescing for httpSpans length check to avoid TypeError
  when spans is undefined
- Document maestro retry envelope contamination limitation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use consistent comment and sleep 5 across both workflows, as suggested
in PR review.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lucas-zimerman
Copy link
Collaborator

I see no issues with the PR, LGTM! once CI pass.

@antonis antonis added Blocked and removed ready-to-merge Triggers the full CI test suite labels Mar 4, 2026
@antonis antonis marked this pull request as draft March 4, 2026 11:24
@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

Fails
🚫 Pull request is not ready for merge, please add the "ready-to-merge" label to the pull request

Generated by 🚫 dangerJS against 9112cb8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants