Add distributed tracing (OpenTelemetry) support#266
Conversation
Add W3C Trace Context propagation throughout the SDK, enabling end-to-end distributed tracing from client to orchestrations, activities, and sub-orchestrations. Core changes: - TracingHelper.java: utility class for trace context capture, extraction, and span management - DurableTaskGrpcClient: refactored to use TracingHelper - TaskOrchestrationExecutor: reads parentTraceContext from ExecutionStartedEvent and propagates to ScheduleTaskAction and CreateSubOrchestrationAction - DurableTaskGrpcWorker: wraps activity and orchestration execution in OTel spans with proper scope management - OrchestrationRunner: adds orchestration span for Azure Functions execution path Tests: - TracingHelperTest: 12 tests covering all utility methods - TaskOrchestrationExecutorTest: 3 new tests verifying trace context propagation to activities and sub-orchestrations Samples: - TracingPattern.java: standalone SDK sample with DTS emulator and Jaeger OTLP exporter - TracingChain.java: Azure Functions sample with chained activities and sub-orchestration - README.md with screenshots showing Jaeger traces and DTS dashboard Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds OpenTelemetry-based distributed tracing to the Durable Task Java SDK, including W3C Trace Context propagation through orchestration/activity execution and runnable samples/tests to validate and demonstrate the behavior.
Changes:
- Introduces
TracingHelperutilities for W3C Trace Context capture/extraction and span lifecycle. - Propagates
parentTraceContextfromExecutionStartedEventinto activity and sub-orchestration scheduling actions. - Wraps orchestration/activity execution paths in OpenTelemetry spans and adds sample + unit tests.
Reviewed changes
Copilot reviewed 13 out of 16 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
client/src/main/java/com/microsoft/durabletask/TracingHelper.java |
Adds helper methods for capturing/extracting W3C context and starting/ending spans. |
client/src/main/java/com/microsoft/durabletask/DurableTaskGrpcClient.java |
Refactors orchestration scheduling trace capture to use TracingHelper. |
client/src/main/java/com/microsoft/durabletask/TaskOrchestrationExecutor.java |
Stores parentTraceContext from history and propagates it to activity/sub-orchestration actions. |
client/src/main/java/com/microsoft/durabletask/DurableTaskGrpcWorker.java |
Adds worker-side spans around orchestration and activity execution. |
client/src/main/java/com/microsoft/durabletask/OrchestrationRunner.java |
Adds orchestration span for the Azure Functions execution path. |
client/src/test/java/com/microsoft/durabletask/TracingHelperTest.java |
Adds unit coverage for trace context round-tripping and span error recording. |
client/src/test/java/com/microsoft/durabletask/TaskOrchestrationExecutorTest.java |
Adds tests to verify trace context propagation into scheduled actions. |
client/build.gradle |
Adds OpenTelemetry SDK/testing deps for unit tests. |
samples/src/main/java/io/durabletask/samples/TracingPattern.java |
Adds a runnable DTS+Jaeger tracing sample. |
samples/build.gradle |
Adds a Gradle run task and OpenTelemetry deps for the tracing sample. |
samples/README.md |
Documents how to run and view the tracing sample. |
samples-azure-functions/src/main/java/com/functions/TracingChain.java |
Adds an Azure Functions sample demonstrating trace propagation. |
samples/images/dts-dashboard-completed.png |
Adds documentation screenshot asset. |
CHANGELOG.md |
Notes the new tracing feature in Unreleased. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
client/src/main/java/com/microsoft/durabletask/DurableTaskGrpcWorker.java
Outdated
Show resolved
Hide resolved
client/src/main/java/com/microsoft/durabletask/TaskOrchestrationExecutor.java
Show resolved
Hide resolved
client/src/main/java/com/microsoft/durabletask/OrchestrationRunner.java
Outdated
Show resolved
Hide resolved
Add isValid() check after creating remote SpanContext to prevent malformed trace contexts from propagating silently. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Tracer name: 'Microsoft.DurableTask' (matching .NET ActivitySource) - Span kinds: Server for worker execution (matching .NET) - Span naming: 'orchestration:<name>' and 'activity:<name>' (not instanceId) - Add 'durabletask.type' attribute on all spans (matching .NET Schema.cs) - Use shared constants for attribute keys and type values - Extract orchestration name from ExecutionStartedEvent for span names Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Screenshots now reflect updated span naming (orchestration:<name>), Server span kind, durabletask.type attribute, and Microsoft.DurableTask tracer name. Added span detail screenshot showing full attribute list. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Re: bachuv's comment on Functions sample screenshots I tested the Azure Functions sample locally with However, Durable Functions distributed tracing V2 exports traces to Application Insights, not to OTLP/Jaeger. The extension requires For Jaeger-style screenshots, we'd need either:
I'd suggest adding a note in the README explaining that Functions traces appear in Application Insights when deployed to Azure, and adding App Insights screenshots as a follow-up when we have a test Azure environment. Would that work? |
- Replaced chaining sample with Fan-Out/Fan-In pattern (5× GetWeather + CreateSummary) - Updated README.md to reflect FanOutFanIn span hierarchy - Captured updated Jaeger screenshots showing parallel activity spans Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Added createClientSpan() to TracingHelper for orchestrator scheduling spans - TaskOrchestrationExecutor creates Client-kind spans when scheduling activities and sub-orchestrations (only during non-replay to avoid duplicates) - DurableTaskGrpcWorker creates orchestration Server span only for first execution - Trace now shows 14 spans with Depth 3, matching .NET SDK exactly: create_orchestration (root) → orchestration (server) → activity (client) → activity (server) for each task - Updated screenshots showing paired Client+Server span hierarchy - Added createClientSpan test coverage (2 new tests, 21 total passing) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Activity worker: track error separately, end span once in finally block (fixes double span.end() call on error path) - OrchestrationRunner: close scope before ending span, add null check (fixes scope/span lifecycle ordering and potential NPE) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Timer span: emitted on TIMERFIRED in orchestrator (Internal kind, durabletask.fire_at attribute, name: orchestration:<name>:timer) - Event span from worker: emitted on sendEvent in orchestrator (Producer kind, name: orchestration_event:<eventName>) - Event span from client: emitted on raiseEvent in client (Producer kind, name: orchestration_event:<eventName>) - Added 3 new tests (17 TracingHelper tests, 24 total) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Timer span now correctly uses parentTraceContext as parent, linking it to the orchestration trace instead of creating a separate trace - Updated sample to include 1-second durable timer before fan-out - Updated screenshots showing 15 spans with timer span in hierarchy - Updated README with timer span documentation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
client/src/main/java/com/microsoft/durabletask/DurableTaskGrpcWorker.java
Outdated
Show resolved
Hide resolved
samples/src/main/java/io/durabletask/samples/TracingPattern.java
Outdated
Show resolved
Hide resolved
- Move create_orchestration span into scheduleNewOrchestrationInstance() so SDK creates it automatically (bachuv feedback: span should not be in user code). TYPE_CREATE_ORCHESTRATION is now used by the client. - Remove manual span creation from TracingPattern sample; simplify code - Update all 4 screenshots with latest trace structure - README span detail shows activity:GetWeather attributes Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…vent timestamp Use ExecutionStartedEvent timestamp as orchestration span start time instead of OrchestrationTraceContext.spanStartTime (which is not populated by DTS). Emit orchestration span only on completion/termination, with startTime from the first ExecutionStartedEvent, so it visually wraps all child activity spans. Added TracingHelper.startSpanWithStartTime() for creating spans with explicit start timestamps. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
I would prefer option 1 and I'm also happy to add these as a follow up item. |
client/src/main/java/com/microsoft/durabletask/DurableTaskGrpcWorker.java
Fixed
Show fixed
Hide fixed
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Address PR #266 review feedback (comment #3993852725): 1. Timer spans now have proper duration from creation to fired time, using TimerCreated event timestamp as setStartTimestamp(). 2. Activity/sub-orchestration client spans now have proper duration from scheduling to completion, created retroactively at completion time with TaskScheduled event timestamp as setStartTimestamp(). 3. Removed instantaneous client span creation at scheduling time; propagate orchestration's parentTraceContext directly instead. Note: Java OTel doesn't support SetSpanId() like .NET, so child spans are siblings under create_orchestration rather than nested under the orchestration span. All 15 spans have meaningful durations. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Addressed all three issues from your feedback:
Note on Java OTel limitation: Java OTel doesn't support |
client/src/main/java/com/microsoft/durabletask/TaskOrchestrationExecutor.java
Fixed
Show fixed
Hide fixed
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Updated all screenshots to reflect the latest span durations: - Timer spans now show creation-to-fired duration (~965ms) - Activity client spans show scheduling-to-completion duration (~184ms) - Activity server spans show execution duration (~25ms) - Orchestration span covers full lifecycle (~1.23s) - 15 total spans in a clean, coherent trace Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Updated TracingChain.java to use the same Fan-Out/Fan-In pattern as the DTS sample (TracingPattern.java): 1s timer → 5× GetWeather → CreateSummary. This ensures both samples are consistent and demonstrate the same tracing capabilities. Updated samples/README.md with Azure Functions section explaining that Durable Functions tracing exports to Application Insights, not Jaeger. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove unused createClientSpan() method (52 lines) and its 2 tests. Replaced by emitRetroactiveClientSpan() which creates spans with proper scheduling-to-completion duration. - Extract emitClientSpanIfTracked() helper to eliminate 4× duplicated retroactive client span emission blocks in task/sub-orchestration completed/failed handlers. - Extract storeSchedulingMetadata() helper to consolidate 2× duplicated scheduling metadata storage in handleTaskScheduled and handleSubOrchestrationCreated. Net: -151 lines. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…pans Start orchestration span BEFORE executor runs so child spans (activities, timers, client spans) are nested under it. Each dispatch creates its own orchestration span, matching JS/dotnet behavior (multiple orchestration spans per trace). Depth is now 3: create_orchestration → orchestration → activity/timer spans. Updated all Jaeger and DTS dashboard screenshots. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
client/src/main/java/com/microsoft/durabletask/DurableTaskGrpcWorker.java
Fixed
Show fixed
Hide fixed
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
…ed orchestration 1. Changed create_orchestration span from INTERNAL (default) to PRODUCER, matching .NET SDK's StartActivityForNewOrchestration which uses ActivityKind.Producer. 2. Set ERROR status on orchestration span when the orchestration fails, matching .NET SDK's pattern of checking CompleteOrchestration action for FAILED status before disposing the span. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
In the new screenshot, it seems like the orchestration:FanOutFanIn span is showing up multiple times (for each replay?). If I'm looking at the correct orchestration sample code, I would expect the following spans:
- create_orchestration:FanOutFanIn
- orchestration:FanOutFanIn
- timer
- activity:GetWeather
- activity:GetWeather
- activity:GetWeather
- activity:GetWeather
- activity:GetWeather
- activity:CreateSummary
Let me know if I'm missing something here and the screenshot is showing the expected amount and type of spans.


Issue describing the changes in this PR
Adds distributed tracing support to the Java SDK using OpenTelemetry, aligned with the .NET SDK tracing conventions in microsoft/durabletask-dotnet.
The SDK now automatically propagates W3C Trace Context (traceparent/tracestate) from client → orchestrations → activities → sub-orchestrations, and creates OTel spans around activity and orchestration execution on the worker side.
Changes
Core SDK (
client/):TracingHelper.java(new) — Utility class withgetCurrentTraceContext(),extractTraceContext(),startSpan(),startSpanWithStartTime(),endSpan(),emitRetroactiveClientSpan(),emitTimerSpan(),emitEventRaisedFromWorkerSpan(),emitEventRaisedFromClientSpan(). UsesMicrosoft.DurableTaskas the tracer name (matching .NETActivitySource).DurableTaskGrpcClient.java— Auto-createscreate_orchestrationspan inscheduleNewOrchestrationInstance(); added event span onraiseEvent()TaskOrchestrationExecutor.java— ReadsparentTraceContextfromExecutionStartedEvent, emits retroactive Client spans at task completion/failure (with scheduling-to-completion duration), emits timer spans with creation-to-fired duration, emits event spans onsendEvent()DurableTaskGrpcWorker.java— Wraps activity execution in Server spans; emits orchestration span on completion withExecutionStartedEventtimestamp as start time for full lifecycle coverageOrchestrationRunner.java— Adds orchestration span for Azure Functions execution pathSpan types (aligned with .NET SDK
TraceActivityConstants.cs):create_orchestrationcreate_orchestration:<name>orchestrationorchestration:<name>activityactivity:<name>timerorchestration:<name>:timerevent/orchestration_eventorchestration_event:<eventName>Span attributes (aligned with .NET SDK
Schema.cs):durabletask.type—"orchestration","activity","create_orchestration","timer","event"durabletask.task.name— orchestration/activity/event namedurabletask.task.instance_id— orchestration instance IDdurabletask.task.task_id— activity/timer task IDdurabletask.fire_at— timer fire time (ISO-8601)durabletask.event.target_instance_id— target instance for raised eventsSpan durations:
ExecutionStartedEventtimestamp to completionEmitTraceActivityForTaskCompleted)TimerCreatedtimestamp toTimerFiredprocessing time (matches .NETEmitTraceActivityForTimer)Tests (21 tests):
TracingHelperTest.java— 19 tests covering all utility methods including retroactive client spans, timer spans with start time, event spans, round-trip context propagation, error recording, and SpanContext validationTaskOrchestrationExecutorTest.java— 4 tests verifying trace context propagation to activities and sub-orchestrationsSamples:
TracingPattern.java— Fan-Out/Fan-In sample with 1s timer, 5× parallelGetWeather+CreateSummary. Uses DTS emulator + Jaeger OTLP exporter.TracingChain.java— Azure Functions sample with HTTP trigger, chained activities, and sub-orchestrationsamples/README.md— Documentation with run instructions and screenshotsScreenshots
Jaeger — Trace search showing FanOutFanIn trace (19 spans):

Jaeger — Full trace detail with proper span durations:

create_orchestration:FanOutFanIn(108ms) →orchestration:FanOutFanIn(1.23s) +orchestration:FanOutFanIn:timer(965ms) + 5×activity:GetWeatherclient (184ms) + 5× server (25ms) +activity:CreateSummaryclient (8ms) + server (0.7ms)Jaeger — Span detail showing attributes (aligned with .NET SDK schema):

Shows
durabletask.type=activity,durabletask.task.name=GetWeather,durabletask.task.task_id=3,otel.scope.name=Microsoft.DurableTask,span.kind=clientDTS Dashboard — FanOutFanIn orchestration completed (1.23s):

Pull request checklist
CHANGELOG.mdAdditional information
opentelemetry-apiandopentelemetry-contextparentTraceContext) already exist in the proto definitionSchema.cs,TraceActivityConstants.cs,TraceHelper.cs)