Removing Wrong Spark Spans for Inactive Databricks Clusters#10651
Removing Wrong Spark Spans for Inactive Databricks Clusters#10651larakulkarni1 wants to merge 18 commits intomasterfrom
Conversation
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 66 metrics, 5 unstable metrics. Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.60.0-SNAPSHOT~a82739e0ad, baseline=1.61.0-SNAPSHOT~4c3b6f3aa2
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.056 s) : 0, 1055953
Total [baseline] (11.034 s) : 0, 11033658
Agent [candidate] (1.054 s) : 0, 1054281
Total [candidate] (11.002 s) : 0, 11001725
section appsec
Agent [baseline] (1.243 s) : 0, 1242889
Total [baseline] (11.184 s) : 0, 11184308
Agent [candidate] (1.25 s) : 0, 1249683
Total [candidate] (11.202 s) : 0, 11202207
section iast
Agent [baseline] (1.226 s) : 0, 1225904
Total [baseline] (11.376 s) : 0, 11376167
Agent [candidate] (1.231 s) : 0, 1231343
Total [candidate] (11.268 s) : 0, 11268176
section profiling
Agent [baseline] (1.189 s) : 0, 1189428
Total [baseline] (11.164 s) : 0, 11163616
Agent [candidate] (1.188 s) : 0, 1187775
Total [candidate] (11.008 s) : 0, 11007570
gantt
title petclinic - break down per module: candidate=1.60.0-SNAPSHOT~a82739e0ad, baseline=1.61.0-SNAPSHOT~4c3b6f3aa2
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.189 ms) : 0, 1189
crashtracking [candidate] (1.194 ms) : 0, 1194
BytebuddyAgent [baseline] (626.446 ms) : 0, 626446
BytebuddyAgent [candidate] (627.055 ms) : 0, 627055
AgentMeter [baseline] (29.101 ms) : 0, 29101
AgentMeter [candidate] (29.028 ms) : 0, 29028
GlobalTracer [baseline] (256.475 ms) : 0, 256475
GlobalTracer [candidate] (256.224 ms) : 0, 256224
AppSec [baseline] (31.419 ms) : 0, 31419
AppSec [candidate] (31.44 ms) : 0, 31440
Debugger [baseline] (59.025 ms) : 0, 59025
Debugger [candidate] (59.154 ms) : 0, 59154
Remote Config [baseline] (594.698 µs) : 0, 595
Remote Config [candidate] (587.191 µs) : 0, 587
Telemetry [baseline] (8.54 ms) : 0, 8540
Telemetry [candidate] (8.662 ms) : 0, 8662
Flare Poller [baseline] (7.146 ms) : 0, 7146
Flare Poller [candidate] (4.983 ms) : 0, 4983
section appsec
crashtracking [baseline] (1.182 ms) : 0, 1182
crashtracking [candidate] (1.208 ms) : 0, 1208
BytebuddyAgent [baseline] (656.017 ms) : 0, 656017
BytebuddyAgent [candidate] (661.5 ms) : 0, 661500
AgentMeter [baseline] (11.985 ms) : 0, 11985
AgentMeter [candidate] (12.048 ms) : 0, 12048
GlobalTracer [baseline] (257.236 ms) : 0, 257236
GlobalTracer [candidate] (259.016 ms) : 0, 259016
IAST [baseline] (23.911 ms) : 0, 23911
IAST [candidate] (23.917 ms) : 0, 23917
AppSec [baseline] (177.776 ms) : 0, 177776
AppSec [candidate] (177.274 ms) : 0, 177274
Debugger [baseline] (65.489 ms) : 0, 65489
Debugger [candidate] (65.301 ms) : 0, 65301
Remote Config [baseline] (573.642 µs) : 0, 574
Remote Config [candidate] (573.756 µs) : 0, 574
Telemetry [baseline] (8.905 ms) : 0, 8905
Telemetry [candidate] (8.869 ms) : 0, 8869
Flare Poller [baseline] (3.557 ms) : 0, 3557
Flare Poller [candidate] (3.602 ms) : 0, 3602
section iast
crashtracking [baseline] (1.196 ms) : 0, 1196
crashtracking [candidate] (1.203 ms) : 0, 1203
BytebuddyAgent [baseline] (795.669 ms) : 0, 795669
BytebuddyAgent [candidate] (799.265 ms) : 0, 799265
AgentMeter [baseline] (11.282 ms) : 0, 11282
AgentMeter [candidate] (11.409 ms) : 0, 11409
GlobalTracer [baseline] (246.331 ms) : 0, 246331
GlobalTracer [candidate] (247.708 ms) : 0, 247708
IAST [baseline] (25.053 ms) : 0, 25053
IAST [candidate] (25.291 ms) : 0, 25291
AppSec [baseline] (26.285 ms) : 0, 26285
AppSec [candidate] (26.512 ms) : 0, 26512
Debugger [baseline] (63.495 ms) : 0, 63495
Debugger [candidate] (63.526 ms) : 0, 63526
Remote Config [baseline] (531.859 µs) : 0, 532
Remote Config [candidate] (553.621 µs) : 0, 554
Telemetry [baseline] (15.039 ms) : 0, 15039
Telemetry [candidate] (14.838 ms) : 0, 14838
Flare Poller [baseline] (4.922 ms) : 0, 4922
Flare Poller [candidate] (4.874 ms) : 0, 4874
section profiling
crashtracking [baseline] (1.176 ms) : 0, 1176
crashtracking [candidate] (1.175 ms) : 0, 1175
BytebuddyAgent [baseline] (687.511 ms) : 0, 687511
BytebuddyAgent [candidate] (686.361 ms) : 0, 686361
AgentMeter [baseline] (8.64 ms) : 0, 8640
AgentMeter [candidate] (8.642 ms) : 0, 8642
GlobalTracer [baseline] (216.638 ms) : 0, 216638
GlobalTracer [candidate] (216.315 ms) : 0, 216315
AppSec [baseline] (32.208 ms) : 0, 32208
AppSec [candidate] (32.158 ms) : 0, 32158
Debugger [baseline] (62.377 ms) : 0, 62377
Debugger [candidate] (62.33 ms) : 0, 62330
Remote Config [baseline] (579.788 µs) : 0, 580
Remote Config [candidate] (577.256 µs) : 0, 577
Telemetry [baseline] (11.39 ms) : 0, 11390
Telemetry [candidate] (11.262 ms) : 0, 11262
Flare Poller [baseline] (3.506 ms) : 0, 3506
Flare Poller [candidate] (3.511 ms) : 0, 3511
ProfilingAgent [baseline] (94.16 ms) : 0, 94160
ProfilingAgent [candidate] (94.355 ms) : 0, 94355
Profiling [baseline] (94.732 ms) : 0, 94732
Profiling [candidate] (94.928 ms) : 0, 94928
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.60.0-SNAPSHOT~a82739e0ad, baseline=1.61.0-SNAPSHOT~4c3b6f3aa2
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.064 s) : 0, 1063633
Total [baseline] (8.809 s) : 0, 8808783
Agent [candidate] (1.063 s) : 0, 1063367
Total [candidate] (8.818 s) : 0, 8818432
section iast
Agent [baseline] (1.222 s) : 0, 1221904
Total [baseline] (9.506 s) : 0, 9506163
Agent [candidate] (1.233 s) : 0, 1233102
Total [candidate] (9.577 s) : 0, 9576833
gantt
title insecure-bank - break down per module: candidate=1.60.0-SNAPSHOT~a82739e0ad, baseline=1.61.0-SNAPSHOT~4c3b6f3aa2
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.206 ms) : 0, 1206
crashtracking [candidate] (1.208 ms) : 0, 1208
BytebuddyAgent [baseline] (631.247 ms) : 0, 631247
BytebuddyAgent [candidate] (631.508 ms) : 0, 631508
AgentMeter [baseline] (29.27 ms) : 0, 29270
AgentMeter [candidate] (29.196 ms) : 0, 29196
GlobalTracer [baseline] (257.786 ms) : 0, 257786
GlobalTracer [candidate] (257.565 ms) : 0, 257565
AppSec [baseline] (31.77 ms) : 0, 31770
AppSec [candidate] (31.735 ms) : 0, 31735
Debugger [baseline] (58.937 ms) : 0, 58937
Debugger [candidate] (58.85 ms) : 0, 58850
Remote Config [baseline] (602.75 µs) : 0, 603
Remote Config [candidate] (592.201 µs) : 0, 592
Telemetry [baseline] (8.711 ms) : 0, 8711
Telemetry [candidate] (8.688 ms) : 0, 8688
Flare Poller [baseline] (7.925 ms) : 0, 7925
Flare Poller [candidate] (7.911 ms) : 0, 7911
section iast
crashtracking [baseline] (1.186 ms) : 0, 1186
crashtracking [candidate] (1.206 ms) : 0, 1206
BytebuddyAgent [baseline] (792.429 ms) : 0, 792429
BytebuddyAgent [candidate] (800.85 ms) : 0, 800850
AgentMeter [baseline] (11.266 ms) : 0, 11266
AgentMeter [candidate] (11.563 ms) : 0, 11563
GlobalTracer [baseline] (246.274 ms) : 0, 246274
GlobalTracer [candidate] (247.977 ms) : 0, 247977
IAST [baseline] (25.115 ms) : 0, 25115
IAST [candidate] (25.285 ms) : 0, 25285
AppSec [baseline] (26.351 ms) : 0, 26351
AppSec [candidate] (26.504 ms) : 0, 26504
Debugger [baseline] (62.894 ms) : 0, 62894
Debugger [candidate] (62.997 ms) : 0, 62997
Remote Config [baseline] (534.988 µs) : 0, 535
Remote Config [candidate] (527.222 µs) : 0, 527
Telemetry [baseline] (14.869 ms) : 0, 14869
Telemetry [candidate] (15.066 ms) : 0, 15066
Flare Poller [baseline] (4.993 ms) : 0, 4993
Flare Poller [candidate] (4.884 ms) : 0, 4884
LoadParameters
See matching parameters
SummaryFound 2 performance improvements and 3 performance regressions! Performance is the same for 14 metrics, 17 unstable metrics.
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.60.0-SNAPSHOT~a82739e0ad, baseline=1.61.0-SNAPSHOT~4c3b6f3aa2
dateFormat X
axisFormat %s
section baseline
no_agent (1.173 ms) : 1162, 1184
. : milestone, 1173,
iast (3.316 ms) : 3277, 3355
. : milestone, 3316,
iast_FULL (5.895 ms) : 5836, 5955
. : milestone, 5895,
iast_GLOBAL (3.696 ms) : 3627, 3764
. : milestone, 3696,
profiling (1.931 ms) : 1915, 1947
. : milestone, 1931,
tracing (1.777 ms) : 1763, 1791
. : milestone, 1777,
section candidate
no_agent (1.193 ms) : 1181, 1205
. : milestone, 1193,
iast (3.095 ms) : 3055, 3134
. : milestone, 3095,
iast_FULL (5.966 ms) : 5907, 6026
. : milestone, 5966,
iast_GLOBAL (3.615 ms) : 3546, 3684
. : milestone, 3615,
profiling (2.212 ms) : 2191, 2233
. : milestone, 2212,
tracing (1.795 ms) : 1781, 1810
. : milestone, 1795,
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.60.0-SNAPSHOT~a82739e0ad, baseline=1.61.0-SNAPSHOT~4c3b6f3aa2
dateFormat X
axisFormat %s
section baseline
no_agent (18.353 ms) : 18160, 18545
. : milestone, 18353,
appsec (18.89 ms) : 18697, 19083
. : milestone, 18890,
code_origins (17.807 ms) : 17626, 17988
. : milestone, 17807,
iast (17.634 ms) : 17459, 17810
. : milestone, 17634,
profiling (18.87 ms) : 18683, 19058
. : milestone, 18870,
tracing (17.868 ms) : 17692, 18043
. : milestone, 17868,
section candidate
no_agent (17.851 ms) : 17667, 18035
. : milestone, 17851,
appsec (18.656 ms) : 18463, 18849
. : milestone, 18656,
code_origins (17.995 ms) : 17816, 18174
. : milestone, 17995,
iast (17.681 ms) : 17507, 17855
. : milestone, 17681,
profiling (18.612 ms) : 18427, 18797
. : milestone, 18612,
tracing (18.746 ms) : 18555, 18936
. : milestone, 18746,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics. Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.60.0-SNAPSHOT~a82739e0ad, baseline=1.61.0-SNAPSHOT~4c3b6f3aa2
dateFormat X
axisFormat %s
section baseline
no_agent (15.101 s) : 15101000, 15101000
. : milestone, 15101000,
appsec (15.003 s) : 15003000, 15003000
. : milestone, 15003000,
iast (18.365 s) : 18365000, 18365000
. : milestone, 18365000,
iast_GLOBAL (17.307 s) : 17307000, 17307000
. : milestone, 17307000,
profiling (14.774 s) : 14774000, 14774000
. : milestone, 14774000,
tracing (15.134 s) : 15134000, 15134000
. : milestone, 15134000,
section candidate
no_agent (15.382 s) : 15382000, 15382000
. : milestone, 15382000,
appsec (15.203 s) : 15203000, 15203000
. : milestone, 15203000,
iast (17.904 s) : 17904000, 17904000
. : milestone, 17904000,
iast_GLOBAL (17.789 s) : 17789000, 17789000
. : milestone, 17789000,
profiling (14.922 s) : 14922000, 14922000
. : milestone, 14922000,
tracing (15.414 s) : 15414000, 15414000
. : milestone, 15414000,
Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.60.0-SNAPSHOT~a82739e0ad, baseline=1.61.0-SNAPSHOT~4c3b6f3aa2
dateFormat X
axisFormat %s
section baseline
no_agent (1.473 ms) : 1462, 1485
. : milestone, 1473,
appsec (3.838 ms) : 3615, 4061
. : milestone, 3838,
iast (2.257 ms) : 2188, 2326
. : milestone, 2257,
iast_GLOBAL (2.31 ms) : 2240, 2379
. : milestone, 2310,
profiling (2.112 ms) : 2055, 2168
. : milestone, 2112,
tracing (2.057 ms) : 2003, 2110
. : milestone, 2057,
section candidate
no_agent (1.48 ms) : 1469, 1492
. : milestone, 1480,
appsec (3.712 ms) : 3498, 3927
. : milestone, 3712,
iast (2.259 ms) : 2190, 2327
. : milestone, 2259,
iast_GLOBAL (2.302 ms) : 2232, 2371
. : milestone, 2302,
profiling (2.085 ms) : 2030, 2139
. : milestone, 2085,
tracing (2.065 ms) : 2012, 2119
. : milestone, 2065,
|
8bf15a0 to
fb1450e
Compare
|
Hi! 👋 Thanks for your pull request! 🎉 To help us review it, please make sure to:
If you need help, please check our contributing guidelines. |
| // Flush any remaining traces and return. | ||
| tracer.flush(); | ||
| return; | ||
| } |
There was a problem hiding this comment.
Looking more closely at this code - it seems like we already try to handle Databricks jobs up above by checking for applicationSpan == null && jobCount > 0 and returning before we ever initialize the application span. Would it make sense to unify our updated logic into there?
This might also explain why we weren't seeing this issue previously - most customers don't start all purpose clusters and run nothing on them before they spin down
There was a problem hiding this comment.
I was thinking about adding in the check in initApplicationSpanIfNotInitialized() because it's used in multiple methods (they have guards currently) but on the off chance it's used again in the future, adding the check in initApplicationSpanIfNotInitialized would prevent a DBX cluster having a spark span if the new code doesn't have a guard.
There was a problem hiding this comment.
For posterity, we discussed in person and we agreed that in that case we would prefer updating all callers to reflect the check being moved into initApplicationSpanIfNotInitialized. For now, though, we follow the existing pattern and update the check where it is called in finishApplication.
What Does This Do
This PR focuses on removing phantom spans for Databricks Clusters. Currently, DBX Clusters can show up as Spark Jobs on the Batch Jobs Table due to the SparkListener incorrectly emitting a spark.application span for the cluster. These should not be appearing on the table. More information available on this doc
Motivation
Incorrect information on the Batch Jobs Table (customers affected).
Additional Notes
Contributor Checklist
type:and (comp:orinst:) labels in addition to any other useful labelsclose,fix, or any linking keywords when referencing an issueUse
solvesinstead, and assign the PR milestone to the issueJira ticket: DJM-1120
Note: Once your PR is ready to merge, add it to the merge queue by commenting
/merge./merge -ccancels the queue request./merge -f --reason "reason"skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.