Skip to content

Trampolining: addition of fields to remove the infinite loops for pinned wfs #721

Open
Shivs11 wants to merge 7 commits intomasterfrom
ss/trampolining-infinite-loops
Open

Trampolining: addition of fields to remove the infinite loops for pinned wfs #721
Shivs11 wants to merge 7 commits intomasterfrom
ss/trampolining-infinite-loops

Conversation

@Shivs11
Copy link
Member

@Shivs11 Shivs11 commented Feb 19, 2026

READ BEFORE MERGING: All PRs require approval by both Server AND SDK teams before merging! This is why the number of required approvals is "2" and not "1"--two reviewers from the same team is NOT sufficient. If your PR is not approved by someone in BOTH teams, it may be summarily reverted.

-WISOTT

  • Users that could using trampolining in the near future could run into a problem, which shall only surface if the workflow author does not have AutoUpgrade as the initial CAN behaviour for their Pinned workflows. As of today, if a user does hit this case, they would have the pinned workflow CAN'ing infinite number of times. This is because a pinned workflow would commence on the pinned version (since it would have inherited pinned version set to true) and if the pinned version is different than a worker-deployment's current version, the targetDeploymentVersionChanged would be set to true in the server. This would then result in the workflow just CAN'ing all the time!
  • This change aims to change that by only allowing at-most one CAN for such pinned workflow executions, and it does so by remembering what the initial version was at the start of the continue-as-new.

Breaking changes

// the workflow is Pinned.
// Experimental.
bool target_worker_deployment_version_changed = 9;
// The target Worker Deployment Version (from the task queue's routing config) observed on the
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially not going to have this field be present on the TaskStartedAttributes. However, I think this is required since when building the MS back from events, we need this information to be present. Not having this information could cause a workflow, on the passive side, to have a CAN. Again, this can only happen when the user has literally forgotten to have AU as the initial CAN behaviour, but I included this because I think we do need consistency between our active and passive clusters.

happy to be corrected here

Copy link
Contributor

@ShahabT ShahabT Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not having this information could cause a workflow, on the passive side

This is not relevant anymore with state-based replication. we don't rebuild MS from history anymore for replication, we only do so for reset.

In case of reset it seems we can set the value in MS based on reset time info, right?

I think the better approach is to pass it from the previous run and have it in the wf execution started event. see temporalio/temporal#9374 (comment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just discussed and will change to passing it from prev execution into the wf execution started event. We need to use another mechanism to detect whether the resulting WF inherited pinned version on continue-as-new anyways, so passing target version from the old workflow makes sense now with no race issues.

@Shivs11 Shivs11 changed the title Trampolining: Add target_version_on_start Trampolining: target_worker_deployment_version_on_start Feb 20, 2026
@Shivs11 Shivs11 marked this pull request as ready for review February 20, 2026 21:12
@Shivs11 Shivs11 requested review from a team as code owners February 20, 2026 21:12
@Shivs11 Shivs11 force-pushed the ss/trampolining-infinite-loops branch from 776bba0 to c69139f Compare March 6, 2026 03:31
@Shivs11 Shivs11 changed the title Trampolining: target_worker_deployment_version_on_start Trampolining: addition of fields to remove the infinite loops for pinned wfs Mar 6, 2026
// latest_target_worker_deployment_version on its versioning info. Used to
// detect whether the routing target has changed since this run started,
// preventing infinite continue-as-new loops for pinned workflows that do
// not have initial CaN behavior set as AutoUpgrade. Nil for fresh starts.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just explicitly say: "Nil for workflows that are not initiated by a versioned workflow continuing-as-new."

Comment on lines +237 to +238
// task, it compares the polling worker's deployment version against the task
// queue's routing target. If they differ, matching sends the routing target
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "the task queue's routing target" is not quite right, because "Target Worker Deployment Version" is a function of workflow_id, task_queue_routing_config, not just task_queue_routing_config. So I think it's best to just always say "Workflow's Target Version." We define that term in our docs, so we can just use it as is without explaining further imo

Comment on lines +242 to +248
// This field exists so that when a workflow continues-as-new, the new run
// can know what the routing target was at the end of the previous run.
// Specifically, it is copied into the new run's WorkflowExecutionStartedEvent,
// enabling the new run to detect whether the routing target has actually
// changed since it started — which is how we prevent infinite
// continue-as-new loops for pinned workflows that do not have initial
// CaN behavior set as AutoUpgrade.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider just omitting this paragraph, but also ok to leave it.

If you keep it, please change "routing target" to "workflow's Target Version"

// changed since it started — which is how we prevent infinite
// continue-as-new loops for pinned workflows that do not have initial
// CaN behavior set as AutoUpgrade.
temporal.api.deployment.v1.WorkerDeploymentVersion latest_target_worker_deployment_version = 9;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh since this is already in the context of "versioning info" we can probably just call it latest_target_version.

Also, I'm realizing that since VersioningInfo is an external API thing, if we want to put:
TargetVersionOnContinueAsNew version and InheritedPinnedVersionOnContinueAsNew bool in here, that would have to be in this API PR as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants