Carol — back to Apps ← Apps

Carolopedia

A friendly guide to Carol, her ecosystem, and the agents who built her.

📖 CarolopediaServicesBuild InitiativesAll activitiesINI-999948
📋

CAROL-INI-0062-00: Unify droid-failure triage in al-sh-01

Initiative
Open in Initiatives →

📖About

Albus has three failure-handling pathways today (in-process ALB-REM, watcher al-en-01, watcher al-sh-01) but only al-sh-01 has authority to create [ALBUS-AUTO] initiatives + bypass-execute fixes. al-sh-01 only triggers on review_verdict_fail from PE-DEV-01 — execution-time failures (Forge no-code-generated, retries-exhausted) bypass it entirely. INI-053 step 4 was the canonical case: Forge could not run chmod due to sandbox; ALB-REM diagnosed correctly but lacked authority; al-sh-01 was never invoked. This initiative makes al-sh-01 the SINGLE triage entry point for ALL droid failures (review-time + exec-time).

⚖️Decisions

  • al-sh-01 becomes the single entry point for ALL droid-failure triage (review-time AND exec-time) — Today three Albus paths exist with split authority: ALB-REM diagnoses but cannot act; al-en-01 nudges Merlin but does not classify; al-sh-01 has authority but only fires on review_verdict_fail. Unifying triage at al-sh-01 connects Albus brain to Albus hands. (Ninad)
  • po_s1 emits a handshake on every pipeline-stopping failure; silent failure is a bug — INI-053 exec 223 stopped silently after 3 forge attempts returned no code, and ALB-REM did not recover. No handshake = no watcher = no triage. Future: every terminal failure in po_s1 must emit so Albus can route. (Ninad)
  • New handshake kind exec_no_code_generated for the no-executable-code failure mode — Distinct from exec_fail_retries_exhausted which fires when execution ran-and-failed (e.g. dbus error). exec_no_code_generated covers the case where the model returned text instead of code (sandbox refusal, ambiguous prompt, etc.). Different signature, different evidence. (Ninad)
  • 7-verdict v2 taxonomy applies to exec-time triggers; AUTH-BLOCKED + TOOLING-BUG are the expected diagnoses for sandbox issues — The taxonomy already names the sandbox case (AUTH-BLOCKED for permission denial; TOOLING-BUG for sandbox config gaps). No new verdicts needed — only the trigger surface widens. (Ninad)
  • Idempotency key in albus_enablement_actions extended to include trigger_kind — Currently dedup only matches review_verdict_fail rows. With three trigger kinds, a stale review handshake could mask a fresh exec-time handshake (or vice versa). Dedup must be scoped to (event_id, trigger_kind). (Ninad)
  • requester rewritten ninad -> orion per CAROL-INI-744: orion is the only human-CLI requester — Backfill of historical rows after INI744 added API-level refusal of requester=ninad. Orion is Ninads CLI agent; all human-originated initiatives are filed with requester=orion. (orion)

Success criteria

  • After the bypass deploys, simulating a 3x No executable code generated failure in po_s1 produces a handshake_requests row with kind=exec_no_code_generated, from_droid=PO-S1, to_agent=albus (must_have)
  • albus_watcher invokes al_sh_01.process for that handshake (verifiable via albus_enablement_actions row with trigger_kind=exec_no_code_generated) (must_have)
  • al_sh_01 returns a verdict in the v2 taxonomy with confidence > 0.5 for a sandbox-permission case (expected: AUTH-BLOCKED or TOOLING-BUG) (must_have)
  • Re-firing the same handshake event_id is a no-op — al_sh_01.process returns action=already_processed (must_have)
  • The existing review_verdict_fail path still classifies and persists with trigger_kind=review_verdict_fail (no regression) (must_have)
  • End-to-end regression: a synthetic plan step that asks Forge to chmod a forbidden path (e.g. /etc/test) flows from po_s1 → handshake → al_sh_01 → verdict → recipient action without operator intervention (must_have)