Skip to content

fix(claude-agent): warn on empty result, debug-log hook skips#281

Open
gabrycina wants to merge 3 commits intomainfrom
fix/claude-agent-empty-result-warning
Open

fix(claude-agent): warn on empty result, debug-log hook skips#281
gabrycina wants to merge 3 commits intomainfrom
fix/claude-agent-empty-result-warning

Conversation

@gabrycina
Copy link
Contributor

@gabrycina gabrycina commented Mar 3, 2026

Summary

  • activities.py: raise RuntimeError when run_claude_agent_activity returns 0 messages, instead of silently succeeding. This lets Temporal mark the activity as FAILED and retry it, rather than the workflow proceeding pointlessly with empty results.

    Root cause: when the model API is unreachable (e.g. LiteLLM proxy can't reach Anthropic due to VPN being off), Claude Code runs for minutes internally retrying, then exits cleanly with 0 messages. The activity previously "succeeded" with no observable error, the workflow continued, and the root cause was invisible.

  • hooks.py: log DEBUG in pre/post_tool_use when the early-return fires due to missing task_id, making it visible why tool call logs aren't appearing.

Test plan

  • Trigger a run with LiteLLM unreachable — should now see activity FAILED in Temporal with RuntimeError: Claude returned 0 messages
  • Trigger a normal run — should complete as before (non-empty messages)
  • Run with --debug and task_id=None — should see DEBUG: Hooks skipping pre_tool_use

🤖 Generated with Claude Code

Greptile Summary

This PR improves error observability for the Claude agent Temporal activity. When the Claude SDK returns 0 messages (e.g., due to model API being unreachable behind a VPN), the activity now raises a RuntimeError instead of silently succeeding with empty results. This lets Temporal mark the activity as FAILED and trigger retries. Additionally, DEBUG-level logging is added to the hook early-return paths to make it clear why tool call logs are absent when task_id is not set.

  • activities.py: Raises RuntimeError when get_results() returns no messages, surfacing connectivity issues to Temporal
  • hooks.py: Adds debug logging in pre_tool_use and post_tool_use when skipping due to missing task_id or tool_use_id

Confidence Score: 4/5

  • This PR is safe to merge — it only adds a fail-fast check and debug logging with no changes to happy-path behavior.
  • The changes are small, well-scoped, and address a real operational issue. The only minor concern is a double-cleanup call in the error path, which is caught gracefully but could produce misleading warning logs.
  • activities.py has a minor double-cleanup issue worth addressing.

Important Files Changed

Filename Overview
src/agentex/lib/core/temporal/plugins/claude_agents/activities.py Adds RuntimeError when Claude returns 0 messages, enabling Temporal retry. Minor double-cleanup issue in the error path.
src/agentex/lib/core/temporal/plugins/claude_agents/hooks/hooks.py Adds DEBUG-level logging when pre/post_tool_use hooks skip due to missing task_id or tool_use_id. Clean, low-risk change.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[run_claude_agent_activity] --> B[Initialize handler & Claude SDK]
    B --> C[Process messages via receive_response]
    C --> D[handler.cleanup]
    D --> E{results has messages?}
    E -- Yes --> F[Return results to Temporal]
    E -- No --> G[Raise RuntimeError]
    G --> H[except block: log error + cleanup again]
    H --> I[Re-raise → Temporal marks FAILED & retries]
Loading

Last reviewed commit: 336e088

gabrycina and others added 3 commits March 3, 2026 21:07
- activities.py: log WARNING when Claude returns 0 messages (catches
  silent failures from model connectivity issues e.g. VPN/proxy down)
- hooks.py: log DEBUG when pre/post_tool_use hooks bail early due to
  missing task_id, making it visible why tool call logs aren't appearing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ssages

Returning 0 messages means the model was unreachable (VPN down, proxy
error, etc). Logging a warning and succeeding caused Temporal to mark
the activity as completed, the workflow to continue pointlessly, and
the root cause to be invisible.

Raising RuntimeError lets Temporal retry the activity or surface the
failure clearly in the workflow timeline.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant