fix(ci): Fix Sample Application E2E test flakiness#5755
fix(ci): Fix Sample Application E2E test flakiness#5755
Conversation
…ners - Increase MAESTRO_DRIVER_STARTUP_TIMEOUT to 180s for slow Tart VMs - Add wait_for_boot and erase_before_boot: false to simulator-action - Add simulator warm-up step before running iOS tests - Sort spaceflight news envelopes by timestamp instead of arrival order - Relax HTTP spans assertion to >= 1 (not all layers complete on slow VMs) - Search all envelopes for app start transaction (may arrive separately) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Semver Impact of This PR⚪ None (no version bump detected) 📋 Changelog PreviewThis is how your changes will appear in the changelog. This PR will not appear in the changelog. 🤖 This preview updates automatically when you update the PR. |
iOS (legacy) Performance metrics 🚀
|
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| ea3e26e+dirty | 1229.13 ms | 1228.46 ms | -0.67 ms |
| 80e4616+dirty | 1221.32 ms | 1225.64 ms | 4.32 ms |
| 818a608+dirty | 1205.76 ms | 1208.00 ms | 2.24 ms |
| 77061ed+dirty | 1233.16 ms | 1234.88 ms | 1.71 ms |
| bef3709+dirty | 1222.07 ms | 1220.24 ms | -1.83 ms |
| a206511+dirty | 1185.00 ms | 1186.35 ms | 1.35 ms |
| 74979ac+dirty | 1210.49 ms | 1213.31 ms | 2.82 ms |
| a2bb688+dirty | 1223.53 ms | 1232.90 ms | 9.37 ms |
| 8a868fe+dirty | 1221.50 ms | 1230.78 ms | 9.28 ms |
| d590428+dirty | 1211.77 ms | 1220.51 ms | 8.75 ms |
App size
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| ea3e26e+dirty | 3.41 MiB | 4.58 MiB | 1.17 MiB |
| 80e4616+dirty | 3.38 MiB | 4.60 MiB | 1.22 MiB |
| 818a608+dirty | 2.63 MiB | 3.91 MiB | 1.28 MiB |
| 77061ed+dirty | 2.63 MiB | 3.98 MiB | 1.34 MiB |
| bef3709+dirty | 3.38 MiB | 4.78 MiB | 1.40 MiB |
| a206511+dirty | 3.41 MiB | 4.67 MiB | 1.25 MiB |
| 74979ac+dirty | 3.38 MiB | 4.60 MiB | 1.22 MiB |
| a2bb688+dirty | 2.63 MiB | 3.99 MiB | 1.36 MiB |
| 8a868fe+dirty | 3.38 MiB | 4.60 MiB | 1.22 MiB |
| d590428+dirty | 3.38 MiB | 4.78 MiB | 1.39 MiB |
Android (legacy) Performance metrics 🚀
|
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 86584b7+dirty | 463.83 ms | 500.31 ms | 36.48 ms |
| 9a81842+dirty | 412.23 ms | 416.56 ms | 4.33 ms |
| c637fc7+dirty | 433.70 ms | 467.76 ms | 34.06 ms |
| d73150f+dirty | 411.21 ms | 465.86 ms | 54.65 ms |
| fa7bb7e+dirty | 350.37 ms | 377.02 ms | 26.65 ms |
| 3bd3f0d+dirty | 447.21 ms | 472.31 ms | 25.10 ms |
| 88890fe+dirty | 350.94 ms | 365.74 ms | 14.80 ms |
| 95aaf8a | 437.89 ms | 419.45 ms | -18.44 ms |
| c0842e7+dirty | 527.76 ms | 566.69 ms | 38.93 ms |
| 1e7a472+dirty | 348.80 ms | 362.55 ms | 13.75 ms |
App size
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 86584b7+dirty | 43.75 MiB | 48.08 MiB | 4.33 MiB |
| 9a81842+dirty | 43.75 MiB | 48.08 MiB | 4.33 MiB |
| c637fc7+dirty | 43.75 MiB | 48.40 MiB | 4.64 MiB |
| d73150f+dirty | 43.75 MiB | 48.55 MiB | 4.80 MiB |
| fa7bb7e+dirty | 17.75 MiB | 19.75 MiB | 2.00 MiB |
| 3bd3f0d+dirty | 17.75 MiB | 19.70 MiB | 1.95 MiB |
| 88890fe+dirty | 17.75 MiB | 19.71 MiB | 1.96 MiB |
| 95aaf8a | 17.75 MiB | 19.68 MiB | 1.93 MiB |
| c0842e7+dirty | 43.75 MiB | 48.41 MiB | 4.66 MiB |
| 1e7a472+dirty | 17.75 MiB | 19.70 MiB | 1.96 MiB |
Android (new) Performance metrics 🚀
|
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 7480abe+dirty | 363.80 ms | 431.34 ms | 67.54 ms |
| 2b89ce9+dirty | 372.22 ms | 417.06 ms | 44.84 ms |
| 170d5ea+dirty | 348.79 ms | 406.94 ms | 58.15 ms |
| b1579bc+dirty | 391.87 ms | 456.26 ms | 64.39 ms |
| 73f2455+dirty | 369.33 ms | 398.90 ms | 29.57 ms |
| 0b64753+dirty | 358.55 ms | 429.16 ms | 70.61 ms |
| 6a70a7e+dirty | 382.45 ms | 424.54 ms | 42.09 ms |
| 2adbd1e+dirty | 366.13 ms | 419.49 ms | 53.36 ms |
| f8d19f8+dirty | 374.17 ms | 383.40 ms | 9.23 ms |
| 7be1f99+dirty | 369.02 ms | 399.60 ms | 30.58 ms |
App size
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 7480abe+dirty | 7.15 MiB | 8.41 MiB | 1.26 MiB |
| 2b89ce9+dirty | 7.15 MiB | 8.41 MiB | 1.26 MiB |
| 170d5ea+dirty | 7.15 MiB | 8.42 MiB | 1.27 MiB |
| b1579bc+dirty | 43.94 MiB | 49.27 MiB | 5.33 MiB |
| 73f2455+dirty | 43.94 MiB | 48.82 MiB | 4.88 MiB |
| 0b64753+dirty | 7.15 MiB | 8.42 MiB | 1.27 MiB |
| 6a70a7e+dirty | 7.15 MiB | 8.42 MiB | 1.26 MiB |
| 2adbd1e+dirty | 7.15 MiB | 8.43 MiB | 1.28 MiB |
| f8d19f8+dirty | 43.94 MiB | 48.91 MiB | 4.97 MiB |
| 7be1f99+dirty | 7.15 MiB | 8.42 MiB | 1.27 MiB |
iOS (new) Performance metrics 🚀
|
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| ea3e26e+dirty | 1216.61 ms | 1214.15 ms | -2.47 ms |
| 80e4616+dirty | 1206.90 ms | 1205.94 ms | -0.96 ms |
| 818a608+dirty | 1218.84 ms | 1223.18 ms | 4.34 ms |
| 77061ed+dirty | 1210.77 ms | 1218.45 ms | 7.68 ms |
| bef3709+dirty | 1217.79 ms | 1225.33 ms | 7.54 ms |
| a206511+dirty | 1225.02 ms | 1223.74 ms | -1.28 ms |
| 74979ac+dirty | 1212.33 ms | 1212.54 ms | 0.21 ms |
| a2bb688+dirty | 1244.82 ms | 1238.60 ms | -6.22 ms |
| 8a868fe+dirty | 1206.85 ms | 1215.04 ms | 8.19 ms |
| d590428+dirty | 1221.23 ms | 1225.27 ms | 4.03 ms |
App size
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| ea3e26e+dirty | 3.41 MiB | 4.58 MiB | 1.17 MiB |
| 80e4616+dirty | 3.38 MiB | 4.60 MiB | 1.22 MiB |
| 818a608+dirty | 3.19 MiB | 4.48 MiB | 1.29 MiB |
| 77061ed+dirty | 3.19 MiB | 4.54 MiB | 1.36 MiB |
| bef3709+dirty | 3.38 MiB | 4.78 MiB | 1.40 MiB |
| a206511+dirty | 3.41 MiB | 4.67 MiB | 1.25 MiB |
| 74979ac+dirty | 3.38 MiB | 4.60 MiB | 1.22 MiB |
| a2bb688+dirty | 3.19 MiB | 4.56 MiB | 1.37 MiB |
| 8a868fe+dirty | 3.38 MiB | 4.60 MiB | 1.22 MiB |
| d590428+dirty | 3.38 MiB | 4.78 MiB | 1.39 MiB |
On slow Cirrus Labs Tart VMs, the app may crash during Maestro flow execution. Add up to 3 retries to handle transient app crashes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
App start transactions (origin: auto.app.start) have app_start_cold measurements but not time_to_initial_display/time_to_full_display. The filter already excluded ui.action.touch but not app start transactions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
...ests/captureSpaceflightNewsScreenTransaction/captureSpaceflightNewsScreenTransaction.test.ts
Show resolved
Hide resolved
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| } | ||
| } | ||
| } | ||
| }; |
There was a problem hiding this comment.
Retry mechanism causes envelope contamination across attempts
Medium Severity
The new maestro retry mechanism runs multiple attempts while the sentry mock server keeps accumulating envelopes from all attempts. If a failed attempt sends partial envelopes before crashing, getAllEnvelopes returns envelopes from both failed and successful runs. This is particularly problematic for takeSecond whose closure counter spans all attempts — it can resolve waitForEnvelope prematurely. After retry, newsEnvelopes may contain extra stale envelopes, causing index-based access (newsEnvelopes[0]) to reference data from a crashed run.
Additional Locations (1)
- Use nullish coalescing for httpSpans length check to avoid TypeError when spans is undefined - Document maestro retry envelope contamination limitation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use consistent comment and sleep 5 across both workflows, as suggested in PR review. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
I see no issues with the PR, LGTM! once CI pass. |


Summary
MAESTRO_DRIVER_STARTUP_TIMEOUTfrom 90s to 180s for slow Cirrus Labs Tart VMswait_for_boot: trueanderase_before_boot: falsetosimulator-actionto ensure the simulator is fully ready before testscaptureSpaceflightNewsScreenTransactiontest by sorting envelopes by timestamp instead of relying on arrival order (which is non-deterministic on slow VMs)captureErrorsScreenTransactionAndroid test by searching all envelopes for the app start transaction instead of only the first matching envelope (on slow emulators, it may arrive separately)#skip-changelog
Test plan
Sample Application / Test ios production REV2passes on CISample Application / Test android production REV2passes on CI🤖 Generated with Claude Code