Skip to content

Watch: implement list-then-watch pattern for resilient resource_version management#2519

Open
Copilot wants to merge 3 commits intomasterfrom
copilot/manage-resource-version-watch-restart
Open

Watch: implement list-then-watch pattern for resilient resource_version management#2519
Copilot wants to merge 3 commits intomasterfrom
copilot/manage-resource-version-watch-restart

Conversation

Copy link
Contributor

Copilot AI commented Mar 3, 2026

When Watch.stream() is called without a resource_version, the stored self.resource_version reflects the last individual object event seen (e.g., a namespace last modified weeks ago). After 30–60 min of inactivity, this stale version falls outside the API server's watch cache, causing a 410 on retry and an unrecoverable ApiException.

Changes

  • watch.py — list-then-watch pattern: Before entering the watch loop, when no resource_version is specified, performs an initial list call (watch=False) and extracts metadata.resource_version from the list response. This collection-level version is always current and valid for watch restarts. Parameters irrelevant to listing (watch, _preload_content, allow_watch_bookmarks, timeout_seconds) are excluded from the list call.

  • Initial items yielded as ADDED events: Existing items from the list are emitted before the watch stream begins, maintaining backward-compatible behavior. Respects the deserialize flag.

  • watch_test.py — updated assertions + new test: Six existing tests updated to account for the two-call pattern (list + watch). New test test_watch_with_initial_list_resource_version validates end-to-end: existing items from the list appear first as ADDED events, the watch starts from the list's resource_version, and subsequent events are delivered correctly.

Example

v1 = kubernetes.client.CoreV1Api()
w = kubernetes.watch.Watch()

# Previously: watch started from an individual object's resourceVersion,
# risking 410 after prolonged inactivity.
# Now: stream() lists first, anchors to the list-level resourceVersion,
# then watches — all transparently.
for event in w.stream(v1.list_namespace):
    print(event['type'], event['object'].metadata.name)
    # On restart after 410, self.resource_version is the list-level
    # version, which is always valid.
Original prompt

This section details on the original issue you should resolve

<issue_title>Manage resourceVersion to allow resilient restart of watch method</issue_title>
<issue_description>What is the feature and why do you need it:

We are using the method stream of object Watch

def stream(self, func, *args, **kwargs):
.

Let say i use this method with v1.list_namespace with no timeout specified (https://github.com/kubernetes-client/python/blob/master/examples/watch/timeout-settings.md) then we see that:

  • with no resourceVersion and timeout specified, the stream atfirst list all namespaces as 'ADDED' event the namespace are alphatically ordered.
  • then the stream wait for event with a self.resource_version probably quite old
  • we then hit a server timeout default kubernetes specified in the previous link between 30min and 1h then the watch and hit a 410.
  • we need then to restart the stream

If during the 30min 1 hour period a namespace is created then the watch store a more recent resourceVersion and then the 410 is reached quite further in the time (probably depending of the history or activity on the cluster).

Describe the solution you'd like to see:

From our test the good resourceVersion to plan a restart is not the resourceVersion of the last event seen but the resourceVersion available in the metadata in the func argument of the stream metdhod func .

In the response there is a metadata.resourceVersion given by kubernetes that allow to restart the stream from this resourceVersion that generate no error.

Not sure if this metadata is available on all func method.

It's quite hard to understand how to use the watch method in the api if we want to maintain a daemon program with no error.
With a no resourceVersion and no timeout specified everyone should now that there is this kind of problem due to the self.resourceVersion storage

</issue_description>

Comments on the Issue (you are @copilot in this section)

@roycaihw /help @k8s-ci-robot @roycaihw: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

Details

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

@k8s-ci-robot
Copy link
Contributor

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Mar 3, 2026
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 3, 2026
…: stale resourceVersion after 410)

Co-authored-by: brendandburns <5751682+brendandburns@users.noreply.github.com>
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Copilot
Once this PR has been reviewed and has the lgtm label, please assign roycaihw for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 3, 2026
…cluded_params set

Co-authored-by: brendandburns <5751682+brendandburns@users.noreply.github.com>
Copilot AI changed the title [WIP] Manage resourceVersion to allow resilient restart of watch method Watch: implement list-then-watch pattern for resilient resource_version management Mar 3, 2026
@brendandburns brendandburns marked this pull request as ready for review March 4, 2026 04:05
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 4, 2026
@k8s-ci-robot k8s-ci-robot requested review from fabianvf and yliaog March 4, 2026 04:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Manage resourceVersion to allow resilient restart of watch method

3 participants