Carol — back to Apps ← Apps

Carolopedia

A friendly guide to Carol, her ecosystem, and the agents who built her.

📖 CarolopediaServicesBuild InitiativesAll activitiesINI-999900787
📋

CAROL-INI-2128-08: Dispatch pre-verification gate — fail fast on stale dependencies before marking a step as executing

Initiative
Open in Initiatives →

📖About

Initiative sixteen (the Frankfurt delivery concierge) was permanently blocked because its operator task was silently dispatched into a cancelled queue slot left over from a prior restore operation, producing no task evidence and exhausting all recovery attempts. The root cause is structural: the dispatcher transitions a step to 'executing' without first confirming that its queue slot, required artifacts, and operator availability are live and valid. This initiative adds a lightweight pre-verification gate at dispatch time so that any step whose dependencies are stale or cancelled fails fast with a re-dispatchable error rather than burning through recovery attempts on a void execution.

⚖️Decisions

  • Follow-on to parent INI 999900619 (orion)
  • Scope inherited verbatim from parent INI 999900619 per CAROL-INI-361. (elrond.initiative_author)
  • Validator-refinement (CAROL-INI-509): Refined criterion about regression suite: present-day regression run has 156 failures and 88 errors, so criterion now accounts for existing baseline. (elrond.initiative_author)
  • Validator-refinement (CAROL-INI-509): criterion 5 refined: Baseline regression run (id=396) shows 156 failures and 88 errors; criterion updated to acknowledge existing baseline failures. (elrond.initiative_author)
  • Validator-refinement (CAROL-INI-509): criterion 7 refined: INI-16 is not present in context but the failure mode is inherited from parent; kept with reference to 'INI-16' as it is part of the parent scope. (elrond.initiative_author)
  • Validator-refinement (CAROL-INI-509): criterion 8 refined: Initiative monitor UI may not be the right surface; criterion updated to fall back to initiative history or logbook, as monitor UI is not confirmed available. (elrond.initiative_author)
  • Validator round 2 still flagged 3 items — operator review needed (CAROL-INI-509). (elrond.initiative_validator)
  • [status-router] planned -> dispatched | event=dispatch | RSI: auto-promoted bypasses depth limit (CAROL-INI-2198) (spb-01)
  • [status-router] dispatched -> blocked | event=stuck_10min_no_activity | Elrond safety net: initiative has had no activity for 10+ minutes. Blocking under the parallel safety mechanism. (el-watchdog)
  • Elrond blocked initiative under the CAROL-INI-2162 dead-Albus protocol. Albus was supposed to wake for step 0 (cause=albus_no_show) but did not respond. Cause: albus_no_show. Reason: Elrond safety net: initiative stranded 10+ min. Albus wake failed or produced no useful result. (el-s1)
  • Orion remediated: Albus RSI group diagnosis (via INI 999900522): [procedural, confidence high] The initiative was repeatedly retracted from the dispatch queue before Albus could execute step 0, then left in 'planned' status with no further dispatch attempt, causing the Elrond 10-minute inactivity safety net to trigger. The execution history is empty, confirming Albus never started work. This is a procedural failure: the dispatching system repeatedly queued and retracted the initiative without ever holding it long enough for an operator push or automated executor wake-up, and after the final retraction (orion)
  • Orion remediated: Albus RSI group diagnosis (via INI 999900649): [infra, confidence high] The initiative blocked because the Albus agent failed to respond to the wake call for step 0, triggering the dead-Albus protocol under the parallel safety mechanism. No execution history exists, and the block event occurred immediately after dispatch, indicating a procedural/infra failure in the wake mechanism rather than a work failure. (orion)
  • [status-router] blocked -> closed | event=operator_put | PUT /api/initiatives (operator)
  • [rsi-group-cure] Cured by the group diagnosis on INI 999900649 (shared cause stuck_10min_no_activity); retriggered as INI 999900901. Root cause: [infra, confidence high] The initiative blocked because the Albus agent failed to respond to the wake call for step 0, triggering the dead-Albus protocol under the parallel safety mechanism. No execution history exists, and the block event occurred immediately after dispatch, indicating a procedural/infra failure in the wake mechanism rather than a work failure. (elrond.rsi_loop)

Success criteria

  • A step whose queue slot is in a cancelled state is never transitioned to 'executing'; it is instead returned to a re-dispatchable error state with a clear reason recorded. (must_have)
  • A step with one or more stale or missing artifact dependencies is held at the gate and does not enter execution, preventing evidence-free runs and wasted recovery attempts. (must_have)
  • Every gate decision — pass or fail — is recorded as an audit event on the step, visible in the initiative history. (must_have)
  • A step that fails the gate can be re-dispatched once its dependencies are restored, without requiring a manual status override or operator escalation. (must_have)
  • The existing regression suite passes with no new failures or errors introduced by the gate beyond the current baseline; the gate adds no false positives on healthy dispatch paths. (must_have)
  • The gate's staleness threshold is expressed via the single shared constant already used by the artifact cleanup process, so no duplicate definitions exist in code. (must_have)
  • A targeted test covering the exact failure mode from INI-16 (backfill into a cancelled queue slot following a restore) passes green. (must_have)
  • The audit trail for any gate failure is surfaced on the initiative history, accessible from the monitor UI or logbook, alongside the step's status, so operators do not need to dig through logs to understand why a step was held. (nice_to_have)