Trampolining: addition of fields to remove the infinite loops for pinned wfs #721
Trampolining: addition of fields to remove the infinite loops for pinned wfs #721
Conversation
| // the workflow is Pinned. | ||
| // Experimental. | ||
| bool target_worker_deployment_version_changed = 9; | ||
| // The target Worker Deployment Version (from the task queue's routing config) observed on the |
There was a problem hiding this comment.
I was initially not going to have this field be present on the TaskStartedAttributes. However, I think this is required since when building the MS back from events, we need this information to be present. Not having this information could cause a workflow, on the passive side, to have a CAN. Again, this can only happen when the user has literally forgotten to have AU as the initial CAN behaviour, but I included this because I think we do need consistency between our active and passive clusters.
happy to be corrected here
There was a problem hiding this comment.
Not having this information could cause a workflow, on the passive side
This is not relevant anymore with state-based replication. we don't rebuild MS from history anymore for replication, we only do so for reset.
In case of reset it seems we can set the value in MS based on reset time info, right?
I think the better approach is to pass it from the previous run and have it in the wf execution started event. see temporalio/temporal#9374 (comment).
There was a problem hiding this comment.
We just discussed and will change to passing it from prev execution into the wf execution started event. We need to use another mechanism to detect whether the resulting WF inherited pinned version on continue-as-new anyways, so passing target version from the old workflow makes sense now with no race issues.
776bba0 to
c69139f
Compare
| // latest_target_worker_deployment_version on its versioning info. Used to | ||
| // detect whether the routing target has changed since this run started, | ||
| // preventing infinite continue-as-new loops for pinned workflows that do | ||
| // not have initial CaN behavior set as AutoUpgrade. Nil for fresh starts. |
There was a problem hiding this comment.
Maybe just explicitly say: "Nil for workflows that are not initiated by a versioned workflow continuing-as-new."
| // task, it compares the polling worker's deployment version against the task | ||
| // queue's routing target. If they differ, matching sends the routing target |
There was a problem hiding this comment.
nit: "the task queue's routing target" is not quite right, because "Target Worker Deployment Version" is a function of workflow_id, task_queue_routing_config, not just task_queue_routing_config. So I think it's best to just always say "Workflow's Target Version." We define that term in our docs, so we can just use it as is without explaining further imo
| // This field exists so that when a workflow continues-as-new, the new run | ||
| // can know what the routing target was at the end of the previous run. | ||
| // Specifically, it is copied into the new run's WorkflowExecutionStartedEvent, | ||
| // enabling the new run to detect whether the routing target has actually | ||
| // changed since it started — which is how we prevent infinite | ||
| // continue-as-new loops for pinned workflows that do not have initial | ||
| // CaN behavior set as AutoUpgrade. |
There was a problem hiding this comment.
I would consider just omitting this paragraph, but also ok to leave it.
If you keep it, please change "routing target" to "workflow's Target Version"
| // changed since it started — which is how we prevent infinite | ||
| // continue-as-new loops for pinned workflows that do not have initial | ||
| // CaN behavior set as AutoUpgrade. | ||
| temporal.api.deployment.v1.WorkerDeploymentVersion latest_target_worker_deployment_version = 9; |
There was a problem hiding this comment.
tbh since this is already in the context of "versioning info" we can probably just call it latest_target_version.
Also, I'm realizing that since VersioningInfo is an external API thing, if we want to put:
TargetVersionOnContinueAsNew version and InheritedPinnedVersionOnContinueAsNew bool in here, that would have to be in this API PR as well.
…arted event attributes
READ BEFORE MERGING: All PRs require approval by both Server AND SDK teams before merging! This is why the number of required approvals is "2" and not "1"--two reviewers from the same team is NOT sufficient. If your PR is not approved by someone in BOTH teams, it may be summarily reverted.
-WISOTT
Breaking changes