Carol — back to Apps ← Apps

Carolopedia

A friendly guide to Carol, her ecosystem, and the agents who built her.

📖 CarolopediaServicesBuild InitiativesAll activitiesINI-13
📋

CAROL-INI-0013-00: Complete Carol planner self-hosting (B.3 + beyond)

Initiative
Open in Initiatives →

📖About

Build pipeline gaps (updated 2026-04-19 session, 19 total). STRATEGIC PHASE PARTIALLY WORKS (SP-01 runs 70s/cache-hit, writes summary). TACTICAL + EXECUTION never crossed strategic boundary in any run. Gaps: (1) sq_01.py argparse rejects --task-pipeline flag that pipeline.py calls it with, sequencer never runs. (2) After SP-01 auto-approves (deliberate human review removed), NOTHING autonomously fires: sibling tasks per S4 prereq (new model per Gap 19), TP-AR-01 per S8 assignment. Plan-first (all deps plan before any execute); cycle + max-depth-5 guards; sequential now / parallel later. (3) ex-dev-01 + pe-dev-01 zero tests, zero runtime invocations. (4) SP-01 data-write fidelity: droid_runs.cost_usd=0, phases_completed=1 (logs 10), duration_ms=0, executions.final_plan=NULL. (5) check_results only 2/10 strategic groups populated (groups 4 and 10); other 8 phases silently dropped. (6) Failure handling: user rule retries + update Orion. SP-01 has 3 retries goes to log_to_enabler (not Orion). TP-AR-01 + PE-DEV-01 + TR-AR-01 have zero retry + zero escalate. EX-DEV-01 escalates to Architect via enabler.db. TS-01 has 3 retries, target unclear. NO droid POSTs to Agentbook 7113 Orion never notified. Shared fix: shared/governance.notify_orion(task_id, kind, description). (7) S11 duplicates S10 in strategic template. (8) OBSERVABILITY: add group_tries(id, execution_id, group_id, try_number, started_at, ended_at, verdict, reason). (9) Add droid_runs.group_try_id FK. 8+9 unlock forensic SQL: what happened on try 2 of group X. (10) DESIGN COMPLIANCE: canonical chain Initiative > Module > Phase > Step > Task > Run > Job > Group > Try > Activity (10 levels after Gap 19). Schema compresses levels: (a) initiative_phases table needed, (b) initiative_plans.phase INT to phase_id FK, (c) step_runs table because run_id integer semantics muddy. (11) Cookbook #4 chain mentions only 5 levels and says 1 task per step; update to full 10-level model with 1 step has 1..N tasks. (12) ROLE COMPLIANCE: checklist_templates owner roles mis-assigned: (a) reviews not routed to Orchestrator/Merlin (0 rows), (b) testing not routed to Tester/Argus (only 12 rows), (c) Carol uses BB role names (Supernova/Architect/Analyst/Big Brother) instead of Carol names (Clara/Albus/Sage), (d) Auditor + Treasurer dont exist in Design #146 need mapping. (13) UNPLANNED DIAGNOSTIC ACTIVITIES owned by Sage (Analyst) per Ninad 2026-04-19. Architect (Albus) is second-line not diagnostic owner. Sage needs diagnostic droids (mirror BB analyst droids: fm-s1, fm-s2, fm-r1, tr-ar-01/02, en-ar-01, pf-ar-01, ts-01). (14) MANAGER (Elrond) activities: only SP-01 tracked. Missing droids + activities for initiative_plan, replan, initiative_review, budget/cycle-gated approve. (15) ORCHESTRATOR (Merlin) activities: zero captured. TP-AR-01 and SQ-01 exist but under agents/elrond/droids should be under Merlin per Design #39. No rerun-decision droid. (16) Identity files for Elrond and Merlin drifted from Design #146. (17) AGENT-CENTRIC ARCHITECTURE VIOLATION: pipeline_jobs has no agent column. One checklist = one job = 3-8 agents steps interleaved (G has 6 agents, A has 8). Target: one checklist to N agent-specific jobs with dependencies. Add pipeline_jobs.assigned_agent; TP-AR-01 produces per-agent tactical plans. (18) ARCHITECT SECOND-LINE COVERAGE: Albus is second line of defence for Sage, Archon, Forge, Argus, Merlin when their retries exhausted. Today only Forge has that escalation (via ex_dev_01). Extend to all five. Build albus enablement droid; if Albus exhausts attempts, escalate to Orion via notify_orion. (19) STEP TO TASK 1:N REDESIGN (FINAL GAP): Today 1 step = 1 task (cookbook #4), deps modelled as child executions of parent task. New model: 1 step = 1..N tasks (1 main + N dependency tasks as siblings under same step). Schema: add executions.step_id FK to initiative_plans.id; task_id becomes per-execution not per-step; add executions.task_kind (main|dependency). Each dep task runs full S1-S12 independently. Step complete when all its tasks complete. Cookbook #4 must be updated. This changes Gap 2s fix shape: create sibling task per dep under same step_id (not child_exec of parent task). NOT A GAP (2026-04-19 Ninad): SP-01 self-approve to status=approved instead of reviewing, deliberate for end-to-end automation.

IMPLEMENTATION STATUS (2026-04-19 pipeline takeover session): Phase 1 + Phase 2 COMPLETE. 11 commits on refactor/whatsapp-client (1f3b7a4, 4698b43, ab3d23e, 193bc3a, 9e3615c, 89fc83a, c3f1052, e8cf03c, debc74d, eb14b54). 10 bypass execs closed (#31, #32, #33 closures for INI-010/011/012; #36, #37, #38 Phase 1; #39, #40, #41, #42, #43, #44, #45 Phase 2). Not pushed to remote. Cookbook #4 updated with FAILURE HANDLING section. Gaps DONE: 6a, 6b, 6c, 6d, 6e, 6f, 6g, 8, 9, 18a (wiring; real enablement logic still TODO). Gaps OPEN for next session (in order): Phase 3 [12, 13, 16], Phase 4 [15, 14, 17, 19], Phase 5 [1, 2, 3], Phase 6 [5, 4], Phase 7 [end-to-end test], Phase 8 [7, 10a/b/c, 11]. See pending-tasks.md for full resume plan.

GAP 20 (raised 2026-04-21 by Ninad): CI-S1 budget estimator must forecast review/retry cycles + multi-phase overhead. Today BASE_COST_PER_STEP * steps + 0.20 overhead + 20% failure buffer under-forecasts initiatives needing multiple review cycles, droid retries, test failures with re-runs, or >1 execution phase. bg-s1 catches overruns after the fact; CI-S1 must forecast them up front. Model changes: (a) default phases = 2 (baseline planning + re-review after first failure) unless template hints otherwise, (b) retry factor per droid up to MAX_ATTEMPTS=3 with attempt-weighted cost, (c) test failure re-run factor for PE-DEV-01 + regression-test steps, (d) review-cycle factor tied to max_review_cycles (budget covers up to that cap, not just one review), (e) ES-S1 historical calibration per-template/per-droid from actual droid_runs. Plumbing: expose new fields on the estimate API (expected_phases, expected_retries, review_cycles), wire cookbook #22 thresholds to the richer model, BB mirror via INI-17.

GAP 21 (raised 2026-04-21 by Ninad — inter-droid plumbing for Gap 14): Gap 14 defines the four Elrond droid bodies in isolation but leaves the inter-droid plumbing unspecified. Four structural pieces needed before bg-s1/ir-s1/dp-s2/ip-s1 can actually orchestrate: (1) HANDSHAKE CHANNEL — no table/queue exists for executor-to-Elrond escalation. Today only notify_orion exists. Add notify_elrond (and notify_merlin) in shared/governance.py with a persistent handshake_requests table (cols: id, source_droid, target_agent, initiative_id, step_id, kind, payload_json, status, created_at). dp-s2 polls handshake_requests WHERE target_agent=elrond AND status=open. (2) GATE AGGREGATOR — bg-s1 only gates budget/cycles, ir-s1 only flips review, dp-s2 only replans; no droid reads all three + handshake payload + pm-s1 status to make the replan-vs-escalate call. Add a dispatcher droid under Elrond (proposed: el-s1 Elrond Supervisor) OR extend dp-s2 to own the aggregation. Decide in the Gap 21 design. (3) ALBUS INSERTION MECHANICS — for dynamic Albus insertion during handshake, specify: insertion target (dependency of failing step vs sibling), Albus executor droid id, task payload shape, how Albus reads failure context (from handshake_requests row). Requires Gap 18 Albus enablement droid interface definition. (4) TERMINATION SEMANTICS — clarify: (a) dp-s2 MAX_ATTEMPTS=3 counts replan attempts per step or per initiative (per step recommended); (b) max_review_cycles enforcement owner — bg-s1 gates, ir-s1 detects overshoot, dp-s2 declines further replans — roles must be explicit; (c) if ir-s1 keeps failing indefinitely under cycle cap, define break condition (e.g., same-criterion-fail N times = forced escalation). MINOR: notify_merlin transport missing (ip-s1 needs it); executor-side handshake raise is implicit (today only EX-DEV-01 + PE-DEV-01 escalate, and only to Albus/enabler not Elrond) — extend all executor droids to raise a handshake on retry exhaustion; Gap 20 forecast (expected_phases, review_cycles) should feed bg-s1 thresholds — wire CI-S1 output to bg-s1 read path.

GAP 22 (raised 2026-04-21 by Ninad — Architect live visibility + Enabler decommission + SST cleanup): The PersistentArchitect (agents/sage/ar_persistent.py) is not actually watching. It's an in-memory Python object that only makes call_claude when the sequencer explicitly invokes its methods (preflight, diagnose_and_fix, etc.). No real-time observation of droid activity, no visibility into the strategic plan (SP-01 S1-S10), tactical plans (TP-AR-01 per-agent), or dispatch decisions (PO-S1 agent bundles). When a step fails, the Architect gets the last 300 chars of log_text from the 5 most recent droid_runs rows — nothing else. Root-cause analysis under that context is impossible. Separately, apps/enabler (port 7149) is a legacy thin narrative log — 58 rows over 6 days, ~80% Foreman PID-missing noise, written by 8 droids, read only by EX-DEV-01 via fetch_enabler_context, current Architect doesn't touch it. Weak pre-cursor to what's needed. Fix in two layers: (A) UNIFIED SESSION LEDGER — new session_events(id, ts, initiative_id, step_id, exec_id, droid_id, agent, event_type, payload_json) + plan_snapshots(id, exec_id, plan_kind, agent, plan_json, created_at) tables; event types droid_start/droid_end/plan_produced/review_verdict/handshake_raised/retry_attempt/retries_exhausted/budget_gate_verdict/cycle_gate_verdict/architect_observation; new shared/governance.log_session_event helper (never-raise contract like notify_orion); wire every droid (SQ-01, TP-AR-01, PO-S1, SP-01, EX-DEV-01, PE-DEV-01, SR-01/02, bg-s1, ir-s1, dp-s2, ip-s1, el-s1, DR-S1, VF-S1) to emit start/end events; plan-producing droids snapshot their output; new architect.get_session_context(initiative_id, step_id) returns rich timeline+plans+costs+failures dict; all Architect methods switch from 300-char snippet to this context. (B) ALBUS WATCHER DAEMON (phased, deferrable) — long-running subprocess spawned when SQ-01 starts, own Claude session with compounding context, polls session_events every 30s, fires proactive architect_observation events when patterns emerge (N-same-criterion-fails → forced escalation, systemic pattern → preemptive fix), shutdown hook on SQ-01 finish. (C) ENABLER DECOMMISSION — port fetch_enabler_context(task_id) → fetch_session_context(task_id) reading session_events; mechanical rename all log_to_enabler call sites (fm_s1 ×5, ex_dev_01 ×4, po_s1 ×2, pp_s1 ×1, rc_01 ×1, vf_s1 ×1, ei_s1 ×1, en_ar_01 ×2, pipeline.py ×1, architect_agent.py ×N — old architect) → log_session_event; freeze enabler.db read-only; retire apps/enabler service (port 7149) + DB after one induction cycle confirms no regressions. (D) SST + REGISTRY CLEANUP — remove enabler from apps, enabler.db from Databases, port 7149 from services, /apps/enabler/* from APIs in registry.db; add session_events + plan_snapshots to Databases; add log_session_event + fetch_session_context to shared/governance contract docs; update induction Master Data Source Catalog. OPEN DESIGN Qs (to decide during implementation, not blocking): (1) ship Layer A alone first as Gap 22a vs bundle A+B? (2) absorb Gap 21 handshake_requests into session_events (event_type=handshake) or keep separate?

⚖️Decisions

  • Bumped budget_usd from $35.29 to $120.00 (2026-04-21). Actual spend $75.76 against original $35.29 estimate = 2.15x, above cookbook #22 2x threshold. Root cause: scope growth during execution — Gaps 17 (Parts A/B1/B2), 19 (step→1:N schema), 14 stubs, Clara droid audit, dead-helper cleanup all landed beyond the 19-gap baseline. New budget covers remaining Gap 14 real logic (3 droids), Gap 21 plumbing, Gap 20 estimator, slots 9/10/11. ES-S1 recalibration requested — CI-S1 base coefficients under-forecasted multi-gap build-pipeline initiatives; Gap 20 will address explicitly.
  • Bypass exec 108: ir-s1 real logic (Gap 14 part 2/4) landed. Est cost ~$3.50. Expanded scope vs original pure R1-R5: added budget-awareness via bg_s1.compute_gate() + Claude essential/trivial classifier for mixed+budget-exhausted cases + narrow grey-zone escalation. New criticality column on initiative_success_criteria (default must_have, back-compat safe). New verdict values: mixed_continue, mixed_essential_continue, closed_with_gaps, grey_zone, vacuous. Next: Gap 21 plumbing (inter-droid handshake + aggregator + Albus insertion), then ip-s1 (part 3/4) and dp-s2 (part 4/4).
  • CANONICAL: Modules are defined by Elrond BEFORE steps; each module is fully independent and triggers its own build pipeline. — Admin-locked definition (Ninad, 2026-04-22). Allows parallel module execution for speed. (Ninad)
  • CANONICAL: Steps are defined by Elrond; Elrond can replace or re-sequence steps during execution based on outcomes. — Admin-locked definition. Preserves Elrond's strategic authority (Layer 1) at runtime. (Ninad)
  • CANONICAL: Two planner layers — Layer 1 free-form (Elrond) + Layer 2 template-driven (Merlin). — Admin-locked. Separates strategic WHAT from tactical HOW. Merlin dynamically generates prompts for team members. (Ninad)
  • CANONICAL: Task-level re-run is Merlin's authority. Phase/initiative-level decisions are Elrond's. Unresolvable issues escalate to Orion. — Admin-locked (GAP-22-AUTH). Prevents Elrond from doing Merlin's job; clear separation of layers. (Ninad)
  • CANONICAL: Four watchers run concurrently during initiative execution — Elrond, Merlin, Albus, and Team members. — Admin-locked. Drives architecture: 3 continuous watcher droids + team executors on carol-vm. (Ninad)
  • CANONICAL: Albus is an enablement watcher, not a reportee-Architect. Realigns Albus role (GAP-24). — Admin-locked. Current code has Albus under Sage as reportee-Architect — wrong role. (Ninad)
  • CANONICAL: Watchers use poller-with-Claude-on-decision pattern. Peak ~8-10 concurrent Claude CLI sessions on carol-vm; idle baseline 0-2. — Admin-locked. Always-thinking watchers would blow the Claude Max rate limit within minutes. (Ninad)
  • CANONICAL: Sessions working on INI-13 must communicate with Ninad in plain language anchored in the execution model; propose before implementing; bookend with 1-2 sentence summaries. — Admin-locked. Avoids jargon-based miscommunication; enables fast approvals. (Ninad)
  • Raised: GAP-22-AUTH — re-route task-level replans from Elrond (el-s1) to new Merlin droid mr-s1; narrow el-s1 scope to phase/initiative events only. — Current code has el-s1 polling handshake_requests for task-level replans, which contradicts the authority split. (Orion)
  • Raised: GAP-23 — build Merlin Watcher. Continuous watcher that observes task outcomes, modifies Layer 2 plan, and composes prompts for team members before instantiation. — Merlin watcher does not exist today. Required to realize the 4-watcher concurrent model. (Orion)
  • Raised: GAP-24 — realign Albus from reportee-Architect (under Sage) to top-level enablement watcher. Implement Albus enablement watcher daemon. — Current Albus role contradicts canonical 4-watcher model; realignment is foundational to the design. (Orion)

Success criteria

  • Strategic plans run end-to-end through SP-01 with full S1-S12 + A-J checklist coverage. (must_have)
  • Tactical plans produced by TP-AR-01 per step with role-correct decomposition (sage->archon->forge->argus->merlin->albus). (must_have)
  • Pipeline orchestration runs through PO-S1 dispatching agent bundles to executor droids. (must_have)
  • Sequencer (SQ-01 + MS-01) handles dispatch with dedup, dependency tree, group_tries observability. (must_have)
  • Layer 1 / Layer 2 boundary observed: Elrond owns modules+steps+phase decisions, Merlin owns tactical droids. (must_have)
  • Re-run authority split honored: task-level by Merlin (mr-s1), phase-level by Elrond (el-s1), escalate to Orion only. (must_have)
  • Merlin Watcher live in production (carol-merlin.service active) and watching session_events. (must_have)
  • Albus Watcher live in production (carol-albus.service active) consuming notify_albus signals. (must_have)
  • Architect (Sage) reads structured session context via shared.governance.fetch_session_context — no log_text snippet fallback. (must_have)
  • Build pipeline can run a non-bypass initiative end-to-end with Elrond/Merlin/Albus/Team watchers active concurrently. (must_have)