Skip to content

Harden contact path failover and simplify routing flow#1908

Open
robekl wants to merge 1 commit intomeshcore-dev:devfrom
robekl:contact-path-failover-hardening
Open

Harden contact path failover and simplify routing flow#1908
robekl wants to merge 1 commit intomeshcore-dev:devfrom
robekl:contact-path-failover-hardening

Conversation

@robekl
Copy link

@robekl robekl commented Mar 3, 2026

Summary

Reimplements and hardens PR #1777 behavior by introducing a conservative active/backup contact path model with safer failover and cleaner routing flow for embedded constraints.

Core behavior changes

  • Adds per-contact active + backup direct route state.
  • Introduces direct path failure tracking and threshold-based failover.
  • Promotes better path candidates conservatively (fewer hops, then fewer path bytes).
  • Switches to backup path after repeated direct ACK timeout failures.
  • Temporarily blocks direct sends (forces flood) when failover cannot find a usable backup.
  • Preserves direct_block_until across new path updates while block is still active.
  • Only emits onContactPathUpdated() when active path actually changes (or is reset/failovered).

Correctness and safety hardening

  • Fixes encoded path handling bugs:
    • avoids using encoded path_len directly as memcpy length.
    • validates path encoding (Packet::isValidPathLen) before route adoption.
  • Removes unsafe pending direct contact pointer tracking:
    • replaced with pubkey-based pending tracking to avoid stale pointers if contact table compacts.
  • Sanitizes added/reset contacts so invalid route state cannot persist.

Refactors/simplifications included

  • Shared route-quality comparator (isPathBetter).
  • Shared route-state reset helpers (resetRouteFailoverState, resetAllRouteState).
  • Consolidated direct-vs-flood send path into explicit helpers:
    • sendUsingBestRouteWithTxtAck
    • sendUsingBestRouteNoTxtAck
  • Caches backup validity state during update path evaluation.
  • Passes now through relevant helpers to reduce repeated RTC reads and simplify temporal reasoning.

Why this should be included

  • Fixes real correctness risks from original PR intent implementation (encoded path length misuse and stale pointer risk).
  • Improves route stability under changing RF conditions without introducing heap allocation.
  • Reduces unnecessary path update churn and repeated logic branches.
  • Better aligns with embedded constraints: deterministic stack/static memory behavior and simpler decision paths.

Benefits

  • More reliable direct routing under intermittent link quality.
  • Faster recovery from bad active paths via bounded failover.
  • Lower risk of memory corruption / undefined behavior from path-length misuse.
  • Lower maintenance cost from centralized routing helpers.
  • No dynamic allocations introduced.

Drawbacks / tradeoffs

  • Increased per-contact RAM usage due to backup route + failover metadata.
  • Added control logic complexity (cooldowns, block windows, failover policy).
  • Policy is conservative; may delay switching to newly discovered alternatives in some edge cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant