Improve handling of failed scan tasks and allow reconnect#332
Improve handling of failed scan tasks and allow reconnect#332
Conversation
📝 WalkthroughWalkthroughControllerAPI is moved from transports to controllers. Controllers now build an API tree and return initial and periodic scan coroutines via Changes
Sequence DiagramsequenceDiagram
participant ControlSystem
participant Controller
participant ControllerAPI
participant ScanTask
participant AttrR
ControlSystem->>Controller: create_api_and_tasks()
Controller->>Controller: _build_api(path)
Controller->>ControllerAPI: construct API tree
Controller-->>ControlSystem: (controller_api, initial_coros, periodic_coros)
ControlSystem->>ControlSystem: await initial_coros
ControlSystem->>ScanTask: start periodic_coros
loop periodic
ScanTask->>Controller: check _connected
alt connected
ScanTask->>Controller: run scan methods
Controller->>AttrR: trigger attribute update callbacks
else disconnected
ScanTask-->>ScanTask: skip until reconnect
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (4)
tests/conftest.py (1)
50-52: Consider using the public API pathway if available.The fixture now calls the private
_build_api([])method directly. While acceptable for test infrastructure, relying on private methods can make tests fragile. If a publiccreate_api_and_tasks()or similar method is intended for external use, consider whether the fixture should use that instead for better alignment with the public contract.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/conftest.py` around lines 50 - 52, The fixture controller_api currently calls the private helper controller._build_api([]); change it to use the controller's public factory method (e.g. create_api_and_tasks or another documented public API-building method) if one exists so tests exercise the public contract instead of private internals; locate the controller_api fixture and replace controller._build_api([]) with the appropriate public call (passing equivalent arguments) to construct the API and tasks.tests/test_controllers.py (1)
260-283: Consider usingconnect()instead of directly setting_connected, and improve task cleanup.
Line 271: Prefer calling
await controller.connect()instead of manually settingcontroller._connected = True. This better reflects the intended usage pattern and tests the public API.Lines 282-283: The task cancellation should handle
CancelledErrorto avoid potential issues:♻️ Proposed improvements
`@pytest.mark.asyncio` async def test_scan_exception_sets_disconnected_and_reconnect_resumes(): class MyTestController(Controller): `@scan`(0.01) async def failing_scan(self): raise RuntimeError("scan error") controller = MyTestController() controller.post_initialise() _, scan_coros, _ = controller.create_api_and_tasks() - controller._connected = True + await controller.connect() task = asyncio.create_task(scan_coros[0]()) # Wait long enough for the scan to run and raise, setting _connected = False await asyncio.sleep(0.1) assert not controller._connected # Trigger reconnect - _connected resumes scan tasks await controller.reconnect() assert controller._connected task.cancel() - await asyncio.sleep(0.01) + try: + await task + except asyncio.CancelledError: + pass🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_controllers.py` around lines 260 - 283, The test sets controller state directly and doesn't properly await task cancellation; change the setup to call and await controller.connect() (instead of setting controller._connected = True) so the public API is exercised (see MyTestController, connect, reconnect, scan decorator and scan_coros usage), and when cancelling the background task created from scan_coros[0]() ensure you await it and handle asyncio.CancelledError to cleanly swallow cancellation (i.e., cancel the task, await it inside a try/except catching CancelledError) so the test teardown is robust.src/fastcs/controllers/controller.py (1)
55-57: Make basedisconnect()clear connection state.Base
disconnect()currently leaves_connectedunchanged. Resetting it toFalsemakes lifecycle semantics safer for subclasses that don’t overridedisconnect().🔧 Suggested change
async def disconnect(self) -> None: """Hook to tidy up resources before stopping the application""" - pass + self._connected = False🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fastcs/controllers/controller.py` around lines 55 - 57, The base async disconnect() leaves the controller's connection flag unchanged; update Controller.disconnect() to set self._connected = False to reliably clear connection state for subclasses that do not override disconnect(), ensuring lifecycle semantics are correct (modify the async def disconnect(self) method in the Controller class to assign self._connected = False before returning).tests/assertable_controller.py (1)
79-79: Prefer public API construction in this test helper.Line 79 uses
controller._build_api([]), which binds tests to a private method. Usingcreate_api_and_tasks()keeps this helper aligned with supported API boundaries.♻️ Suggested change
- controller_api = controller._build_api([]) + controller_api, _, _ = controller.create_api_and_tasks()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/assertable_controller.py` at line 79, The test helper currently calls the private method controller._build_api([]); replace that with the public factory create_api_and_tasks() to avoid depending on a private symbol. Call create_api_and_tasks() (or controller.create_api_and_tasks() if it is a method) and unpack its return (e.g., api, tasks = create_api_and_tasks(...)) or assign the returned API to controller_api so the helper uses the supported public API surface instead of _build_api.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/fastcs/connections/ip_connection.py`:
- Around line 86-92: The current close sequence only clears self.__connection
when connection.close() succeeds or raises ConnectionResetError; move the
clearing into a finally block so self.__connection is set to None regardless of
what exception close() raises. Concretely, wrap the await connection.close()
call in try/except/finally (or use try/finally) inside the async with
self._connection context, catch/handle specific ConnectionResetError if desired,
and always assign self.__connection = None in the finally so the stale handle is
never retained; reference the symbols self._connection, connection.close(), and
self.__connection to locate and update the code.
In `@src/fastcs/control_system.py`:
- Around line 118-121: The startup currently calls self._run_initial_coros()
unconditionally after await self._controller.reconnect(), which means if
reconnect() returns without raising but leaves the connection flag
(self._connected or self._controller.connected) False those ONCE coroutines will
run while disconnected and never be retried; change the logic so
_run_initial_coros() only runs when a successful connection exists (check
self._connected or self._controller.connected) and otherwise mark the initial
coroutines as pending (e.g., set an attribute like self._initial_coros_pending =
True) and invoke _run_initial_coros() from the reconnect-success path (or the
reconnect callback/handler) when connection becomes True, leaving
_start_scan_tasks() behavior unchanged or similarly gated if needed.
In `@src/fastcs/demo/controllers.py`:
- Around line 98-102: reconnect currently calls self.connection.close() and
self.connection.connect(...) without handling exceptions, violating the
Controller.reconnect() contract; wrap the close/connect calls in a try/except
that catches Exception, log the failure with a descriptive message and exception
info (use a module logger via logging.getLogger(__name__)), and ensure
self._connected is only set True on successful connect and left False on failure
(do not re-raise).
---
Nitpick comments:
In `@src/fastcs/controllers/controller.py`:
- Around line 55-57: The base async disconnect() leaves the controller's
connection flag unchanged; update Controller.disconnect() to set self._connected
= False to reliably clear connection state for subclasses that do not override
disconnect(), ensuring lifecycle semantics are correct (modify the async def
disconnect(self) method in the Controller class to assign self._connected =
False before returning).
In `@tests/assertable_controller.py`:
- Line 79: The test helper currently calls the private method
controller._build_api([]); replace that with the public factory
create_api_and_tasks() to avoid depending on a private symbol. Call
create_api_and_tasks() (or controller.create_api_and_tasks() if it is a method)
and unpack its return (e.g., api, tasks = create_api_and_tasks(...)) or assign
the returned API to controller_api so the helper uses the supported public API
surface instead of _build_api.
In `@tests/conftest.py`:
- Around line 50-52: The fixture controller_api currently calls the private
helper controller._build_api([]); change it to use the controller's public
factory method (e.g. create_api_and_tasks or another documented public
API-building method) if one exists so tests exercise the public contract instead
of private internals; locate the controller_api fixture and replace
controller._build_api([]) with the appropriate public call (passing equivalent
arguments) to construct the API and tasks.
In `@tests/test_controllers.py`:
- Around line 260-283: The test sets controller state directly and doesn't
properly await task cancellation; change the setup to call and await
controller.connect() (instead of setting controller._connected = True) so the
public API is exercised (see MyTestController, connect, reconnect, scan
decorator and scan_coros usage), and when cancelling the background task created
from scan_coros[0]() ensure you await it and handle asyncio.CancelledError to
cleanly swallow cancellation (i.e., cancel the task, await it inside a
try/except catching CancelledError) so the test teardown is robust.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 942faf23-2977-4afa-afe4-a32b3fe23c64
📒 Files selected for processing (33)
src/fastcs/attributes/attr_r.pysrc/fastcs/connections/ip_connection.pysrc/fastcs/control_system.pysrc/fastcs/controllers/__init__.pysrc/fastcs/controllers/base_controller.pysrc/fastcs/controllers/controller.pysrc/fastcs/controllers/controller_api.pysrc/fastcs/demo/controllers.pysrc/fastcs/transports/__init__.pysrc/fastcs/transports/controller_api.pysrc/fastcs/transports/epics/ca/ioc.pysrc/fastcs/transports/epics/ca/transport.pysrc/fastcs/transports/epics/docs.pysrc/fastcs/transports/epics/gui.pysrc/fastcs/transports/epics/pva/ioc.pysrc/fastcs/transports/epics/pva/pvi.pysrc/fastcs/transports/epics/pva/transport.pysrc/fastcs/transports/epics/util.pysrc/fastcs/transports/graphql/graphql.pysrc/fastcs/transports/graphql/transport.pysrc/fastcs/transports/rest/rest.pysrc/fastcs/transports/rest/transport.pysrc/fastcs/transports/tango/dsr.pysrc/fastcs/transports/tango/transport.pysrc/fastcs/transports/transport.pytests/assertable_controller.pytests/conftest.pytests/test_control_system.pytests/test_controllers.pytests/transports/epics/ca/test_gui.pytests/transports/epics/ca/test_softioc.pytests/transports/epics/pva/test_pva_gui.pytests/transports/rest/test_rest.py
💤 Files with no reviewable changes (2)
- src/fastcs/transports/init.py
- src/fastcs/transports/controller_api.py
| async with self._connection as connection: | ||
| await connection.close() | ||
| self.__connection = None | ||
| try: | ||
| await connection.close() | ||
| except ConnectionResetError: | ||
| pass | ||
|
|
||
| self.__connection = None |
There was a problem hiding this comment.
Ensure connection state is cleared even when close() fails unexpectedly.
If connection.close() raises anything other than ConnectionResetError, self.__connection is never reset. That can leave a stale connection handle and interfere with reconnect behavior.
Suggested fix
async def close(self):
- if self.__connection is None:
+ connection = self.__connection
+ if connection is None:
return
- async with self._connection as connection:
- try:
- await connection.close()
- except ConnectionResetError:
- pass
-
- self.__connection = None
+ try:
+ async with connection:
+ try:
+ await connection.close()
+ except ConnectionResetError:
+ pass
+ finally:
+ self.__connection = None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async with self._connection as connection: | |
| await connection.close() | |
| self.__connection = None | |
| try: | |
| await connection.close() | |
| except ConnectionResetError: | |
| pass | |
| self.__connection = None | |
| async with self._connection as connection: | |
| try: | |
| try: | |
| await connection.close() | |
| except ConnectionResetError: | |
| pass | |
| finally: | |
| self.__connection = None |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/fastcs/connections/ip_connection.py` around lines 86 - 92, The current
close sequence only clears self.__connection when connection.close() succeeds or
raises ConnectionResetError; move the clearing into a finally block so
self.__connection is set to None regardless of what exception close() raises.
Concretely, wrap the await connection.close() call in try/except/finally (or use
try/finally) inside the async with self._connection context, catch/handle
specific ConnectionResetError if desired, and always assign self.__connection =
None in the finally so the stale handle is never retained; reference the symbols
self._connection, connection.close(), and self.__connection to locate and update
the code.
| async def reconnect(self): | ||
| await self.connection.close() | ||
| await self.connection.connect(self._settings.ip_settings) | ||
|
|
||
| self._connected = True |
There was a problem hiding this comment.
Violates base class contract: reconnect() must not raise exceptions.
Per the base Controller.reconnect() docstring (src/fastcs/controllers/controller.py:43-53), this method "should not raise an exception" and should "log an error with the reason" if reconnection fails. The current implementation allows exceptions from close() and connect() to propagate, which will crash the caller in control_system.py since it has no exception handling around reconnect().
🐛 Proposed fix: wrap in try-except and log errors
async def reconnect(self):
- await self.connection.close()
- await self.connection.connect(self._settings.ip_settings)
-
- self._connected = True
+ try:
+ await self.connection.close()
+ except Exception:
+ pass # Ignore close errors, connection may already be closed
+
+ try:
+ await self.connection.connect(self._settings.ip_settings)
+ self._connected = True
+ except Exception:
+ logger.exception(
+ "Failed to reconnect to %s", self._settings.ip_settings
+ )This also requires importing logger at the top of the file if not already present:
import logging
logger = logging.getLogger(__name__)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/fastcs/demo/controllers.py` around lines 98 - 102, reconnect currently
calls self.connection.close() and self.connection.connect(...) without handling
exceptions, violating the Controller.reconnect() contract; wrap the
close/connect calls in a try/except that catches Exception, log the failure with
a descriptive message and exception info (use a module logger via
logging.getLogger(__name__)), and ensure self._connected is only set True on
successful connect and left False on failure (do not re-raise).
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #332 +/- ##
==========================================
- Coverage 90.83% 90.52% -0.32%
==========================================
Files 70 70
Lines 2574 2574
==========================================
- Hits 2338 2330 -8
- Misses 236 244 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Pause all scan tasks when an exception occurs in one Add `Controller.reconnect` method to reocnnect and unpause scan tasks Move creation of initial tasks and scan tasks into Controller
62d40ea to
07bcbc4
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/fastcs/controllers/base_controller.py (1)
393-404: Clean API tree construction with recursive sub-controller handling.The
_build_apimethod correctly builds theControllerAPIhierarchy, including proper path propagation to nested sub-controllers.Consider using unpacking syntax for the path construction (per Ruff RUF005):
🔧 Minor style improvement
sub_apis={ - name: sub_controller._build_api(path + [name]) # noqa: SLF001 + name: sub_controller._build_api([*path, name]) # noqa: SLF001 for name, sub_controller in self.sub_controllers.items() },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fastcs/controllers/base_controller.py` around lines 393 - 404, In _build_api, replace the list concatenation used to build child paths (currently using path + [name] inside the sub_apis comprehension) with list unpacking so the child path is constructed as [*path, name]; update the comprehension where sub_controller._build_api is called (referencing sub_controllers, sub_controller, sub_apis) to use the unpacked path form to satisfy Ruff RUF005 and keep semantics identical.src/fastcs/controllers/controller.py (1)
85-93: Refactor to use the publicio_refproperty instead of directly accessing the private_io_refattribute.The
Attributeclass (inherited byAttrR) provides a publicio_refproperty andhas_io_ref()method. Replace the pattern match on the private_io_refwith logic that uses these public accessors:Example refactor
for attribute in api.attributes.values(): if isinstance(attribute, AttrR) and attribute.has_io_ref(): io_ref = attribute.io_ref if isinstance(io_ref, AttributeIORef): update_period = io_ref.update_period if update_period is ONCE: initial_coros.append(attribute.bind_update_callback()) elif update_period is not None: scan_dict[update_period].append( attribute.bind_update_callback() )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/fastcs/controllers/controller.py` around lines 85 - 93, The pattern-match is directly accessing the private _io_ref; update the loop to use the public API: check isinstance(attribute, AttrR) and attribute.has_io_ref(), get io_ref = attribute.io_ref, ensure isinstance(io_ref, AttributeIORef), then read update_period = io_ref.update_period and branch as before (if update_period is ONCE -> initial_coros.append(attribute.bind_update_callback()); elif update_period is not None -> scan_dict[update_period].append(attribute.bind_update_callback())). Keep references to AttrR, has_io_ref(), io_ref, AttributeIORef, update_period, ONCE, bind_update_callback(), initial_coros and scan_dict to locate and update the code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_controllers.py`:
- Around line 282-283: After calling task.cancel(), replace the sleep-based
cleanup with explicitly awaiting the cancelled task to ensure proper teardown;
await the Task object named task and suppress/handle asyncio.CancelledError
(e.g., with a try/except around await task) so the cancelled coroutine is fully
awaited and avoids pending-task warnings in tests that use the task variable in
tests/test_controllers.py.
---
Nitpick comments:
In `@src/fastcs/controllers/base_controller.py`:
- Around line 393-404: In _build_api, replace the list concatenation used to
build child paths (currently using path + [name] inside the sub_apis
comprehension) with list unpacking so the child path is constructed as [*path,
name]; update the comprehension where sub_controller._build_api is called
(referencing sub_controllers, sub_controller, sub_apis) to use the unpacked path
form to satisfy Ruff RUF005 and keep semantics identical.
In `@src/fastcs/controllers/controller.py`:
- Around line 85-93: The pattern-match is directly accessing the private
_io_ref; update the loop to use the public API: check isinstance(attribute,
AttrR) and attribute.has_io_ref(), get io_ref = attribute.io_ref, ensure
isinstance(io_ref, AttributeIORef), then read update_period =
io_ref.update_period and branch as before (if update_period is ONCE ->
initial_coros.append(attribute.bind_update_callback()); elif update_period is
not None -> scan_dict[update_period].append(attribute.bind_update_callback())).
Keep references to AttrR, has_io_ref(), io_ref, AttributeIORef, update_period,
ONCE, bind_update_callback(), initial_coros and scan_dict to locate and update
the code.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4018d294-4bb1-4a14-a4fb-b0d7bc0e8d1b
📒 Files selected for processing (33)
src/fastcs/attributes/attr_r.pysrc/fastcs/connections/ip_connection.pysrc/fastcs/control_system.pysrc/fastcs/controllers/__init__.pysrc/fastcs/controllers/base_controller.pysrc/fastcs/controllers/controller.pysrc/fastcs/controllers/controller_api.pysrc/fastcs/demo/controllers.pysrc/fastcs/transports/__init__.pysrc/fastcs/transports/controller_api.pysrc/fastcs/transports/epics/ca/ioc.pysrc/fastcs/transports/epics/ca/transport.pysrc/fastcs/transports/epics/docs.pysrc/fastcs/transports/epics/gui.pysrc/fastcs/transports/epics/pva/ioc.pysrc/fastcs/transports/epics/pva/pvi.pysrc/fastcs/transports/epics/pva/transport.pysrc/fastcs/transports/epics/util.pysrc/fastcs/transports/graphql/graphql.pysrc/fastcs/transports/graphql/transport.pysrc/fastcs/transports/rest/rest.pysrc/fastcs/transports/rest/transport.pysrc/fastcs/transports/tango/dsr.pysrc/fastcs/transports/tango/transport.pysrc/fastcs/transports/transport.pytests/assertable_controller.pytests/conftest.pytests/test_control_system.pytests/test_controllers.pytests/transports/epics/ca/test_gui.pytests/transports/epics/ca/test_softioc.pytests/transports/epics/pva/test_pva_gui.pytests/transports/rest/test_rest.py
💤 Files with no reviewable changes (2)
- src/fastcs/transports/init.py
- src/fastcs/transports/controller_api.py
🚧 Files skipped from review as they are similar to previous changes (15)
- src/fastcs/transports/graphql/graphql.py
- src/fastcs/connections/ip_connection.py
- tests/transports/epics/ca/test_softioc.py
- src/fastcs/transports/rest/transport.py
- tests/transports/rest/test_rest.py
- tests/conftest.py
- tests/transports/epics/pva/test_pva_gui.py
- src/fastcs/transports/epics/docs.py
- src/fastcs/demo/controllers.py
- src/fastcs/transports/epics/ca/transport.py
- src/fastcs/transports/epics/pva/transport.py
- src/fastcs/transports/graphql/transport.py
- src/fastcs/controllers/init.py
- tests/assertable_controller.py
- src/fastcs/transports/epics/pva/ioc.py
| task.cancel() | ||
| await asyncio.sleep(0.01) |
There was a problem hiding this comment.
Await cancelled task to avoid flaky async cleanup.
After task.cancel(), explicitly awaiting the task is safer than sleeping and avoids pending-task warnings in async test runs.
✅ Suggested test cleanup change
task.cancel()
- await asyncio.sleep(0.01)
+ with pytest.raises(asyncio.CancelledError):
+ await task🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_controllers.py` around lines 282 - 283, After calling
task.cancel(), replace the sleep-based cleanup with explicitly awaiting the
cancelled task to ensure proper teardown; await the Task object named task and
suppress/handle asyncio.CancelledError (e.g., with a try/except around await
task) so the cancelled coroutine is fully awaited and avoids pending-task
warnings in tests that use the task variable in tests/test_controllers.py.
Summary by CodeRabbit
Refactoring
Bug Fixes
New Features