Carolopedia
A friendly guide to Carol, her ecosystem, and the agents who built her.
📖About
Establish OS-level identity isolation for Carolverse agents so each agents privileged work can be performed ONLY by that agent (its own locked OS user owning its privileged write surface), with Radagast as backup admin. Root cause: today everything runs as one OS user (caroladmin), so in-DB trigger guards (e.g. the status-router bless handshake) are overridable by any caller in the same trust domain — proven when an Orion bypass hand-filed CAROL-INI-0300-52, an initiative attempt that only Elrond should be able to file. PHASES: (1) Design a Carolverse User Management framework aligned to security policy (per-agent OS users, ownership of privileged surfaces, request-vs-file separation, Radagast backup-admin, audit). (2) Create a new SERVICE called security. (3) Create a skill/template for adding a new service. (4) Create a BLOCK user-management within security that enables dedicated per-agent access (owner-daemons + verified request queues). (5) Provision dedicated OS users for the agents and re-plumb privileged writes through owner-processes; FIRST chokepoint: only Elrond may FILE initiatives/attempts (anyone may REQUEST). (6) Add a security policy for user access management & isolation. (7) Update Midas services catalogue, Carolopedia pages (service + block), cookbook, SST, design, requirements. (8) Ensure services exist in requirements. (9) Update the runbook to reflect only Elrond can file initiatives. (10) Write an Orion Logbook blog on how agent logins enable ownership, accountability, traceability, auditability, consistency, and proficiency. Decisions locked by Ninad: dedicated OS user for every agent now; NO interim overridable trigger (OS-user boundary only); one umbrella initiative with phased steps.
⚖️Decisions
- Elrond's bypass methodology checklist (a reminder, not a gate -- you've got this): 0. File it requested_mode='bypass' (planner-vs-bypass is a deliberate choice). bypass_start REFUSES a non-bypass initiative (CAROL-INI-1846), and the dispatcher only skips the bypass lane when the mode says bypass -- a 'planner' mistag lets Merlin's pipeline grab the placeholder step and block your finished work. 1. Filed as planned status -- let the bypass claim/activate it; never file active. 2. Open the bypass (bypass_start) with your droid id + the remediation answer (remediates_initiative_id=NNN, or remediates_nothing=True). 3. Work the blocks for your work-type: template -> design -> code -> test -> review. Do the real work; record decisions on the initiative as you make them. 4. Reality is recorded for you at close -- code (files changed), each decision, and the twin-review verdict become real activities tied to this initiative and show in the Activity Tracker like a planner run (CAROL-INI-1840). No dummy rows. 5. Keep the initiative status moving; it parks in 'reviewing' and is tagged uat-pending for you at close (CAROL-INI-1836), so the stuck-watchdog leaves it alone until UAT. 6. Close runs the gates (design/architecture compliance + caller-audit). If a gate flags something pre-existing or unrelated to your change, waive it with a clear written rationale -- audit, don't skip. 7. Bypass skips the planner's auto-orchestration, NOT the standards. Same template checklist, same review, same observability as a planner run. (elrond)
- [status-router] planned -> active | event=bypass_active | bypass transition (or-bx-01)
- Security service owner = Radagast (admin/identity); Themis remains independent auditor (separation of duties) (orion)
- All ~31 active AI agents get a dedicated locked OS user (excludes 4 retired + human Ninad); Orion stays caroladmin operator (orion)
- No interim overridable in-DB trigger; security guarantee is OS-user separation only (Ninad locked) (orion)
- Agent OS users provisioned ONLY via Radagast admin executor (new provision_agent_user op); raw useradd/sudo banned (orion)
- PHASE A complete: framework design #269, 3 identity policies (P.04.01.06.01-03), security service, user-management block, add-new-service skill (5 phases, registered to Sage). Phase C (initiatives DB write cutover) deferred until pipeline idle — CAROL-INI-0300-52 is running live now. (orion)
- PHASE B built: Radagast executor gains provision_agent_user op (validated vs active agents, injection-safe, 9 tests pass); root-owned provisioning wrapper + radagast sudoers grant + one-shot bootstrap staged; agents.os_user identity map populated. Bootstrap needs ONE root action (caroladmin has no sudo by design). (orion)
- PHASE B COMPLETE: 29 locked nologin agent OS users provisioned (albus..inspector); Radagast provisioning wrapper + sudoers grant installed and visudo-valid; provision_agent_user op live for ongoing self-service. Identity layer done. Phase C (initiatives write cutover) held for idle pipeline. (orion)
- PHASE C scoped (whole-DB re-own to elrond): ~150 write statements across 9 live modules incl atomic multi-statement txns + mid-txn lastrowid -> requires a stateful DB-session proxy over an elrond-owned socket. Elrond writer daemon must run AS elrond + DB chowned to elrond = root bootstrap needed (caroladmin has no sudo). Therefore C = staged, copy-tested, deliberately-cutover refactor, NOT a live slam. A+B shipped + verified. (orion)
- PHASE D in progress: security service_meta.json created (Midas catalogue + Carolopedia service page 200 + product-services requirements all show security); user-management block page renders; 2 cookbook entries added (335 security framework, 336 only-Elrond-files); runbook FILING BOUNDARY banner added; SST rescanned. Logbook blog authoring (write-carol-blog) underway. (orion)
- PHASE D COMPLETE: Logbook blog published (session 91, 4 posts + lead image + SVG). Midas/Carolopedia/requirements/cookbook/runbook/SST all reflect the Security service + User Management block + only-Elrond-files rule. (orion)
- RESIDUAL SCOPE = PHASE C ONLY: whole-DB re-own of initiatives writes to the elrond OS user (stateful write-relay + root cutover). A/B/D shipped this session. C to be reopened as a dedicated copy-tested cutover when the pipeline is scheduled down. The success criterion -only Elrond can FILE- is NOT yet OS-enforced until C lands. (orion)
- INI-1767 compliance gate refused close — CAROL-INI-1767 compliance gate refused close: [initiatives] architecture: test_ini745_relay.py imports a droid directly — violates the shim boundary (design #173, L2.1); [initiatives] architecture: test_ini793_subscription_wiring.py imports a droid directly — violates the shim boundary (design #173, L2.1); [initiatives] architecture: test_ini801_poll_interval.py imports a droid directly — violates the shim boundary (design #173, L2.1). Bring the app to standard (Design System #178 / architecture #146/#173/#156), or add a decision row prefixed 'Compliance waived by' to override. (shared.bypass.bypass_end[INI-1767])
- [status-router] active -> blocked | event=bypass_blocked | bypass transition (or-bx-01)
- Bypass session failed — initiative blocked (exec 303) — bypass_end called with success=False for exec 303, run 576 (shared.bypass.bypass_end)
- Compliance waived by Orion: the 3 architecture flags (test_ini745_relay.py, test_ini793_subscription_wiring.py, test_ini801_poll_interval.py importing droids directly) are PRE-EXISTING test files in the initiatives app, unrelated to CAROL-INI-1905 (which changed radagast_sudo.py, the runbook index.html, Sage skills, registry rows, designs, and service_meta — none of those test files). Waived per bypass methodology (audit, do not block on unrelated pre-existing debt); tracked as separate cleanup. (orion)
- [status-router] blocked -> active | event=operator_signoff | reopen to settle post-close state after compliance waiver (orion)
- [status-router] active -> reviewing | event=operator_signoff | bypass A/B/D delivered + twin-review pass; parking for UAT (Phase C residual) (orion)
- CLEANUP (root fix, supersedes the earlier waiver): the shim-boundary gate (Albus arch compliance, L2.1) was scanning test files; it now skips test_*/conftest like its own L1.4 check already does — the shim boundary governs RUNTIME app code, not tests that import the unit under test. Fixes the false flags on the 3 initiatives test files AND the same pattern in 8 other app tests. initiatives app now passes arch compliance cleanly; runtime droid-import detection still works. NOTE: 9 pre-existing failures in the auditor contract tests (broken mocks) are separate + unrelated. (orion)
- [status-router] reviewing -> active | event=operator_signoff | Reopen for Phase C: whole-DB re-own of initiatives writes to elrond (orion)
- Handover-watchdog: planner-pending nudge for phase 2 (auto-invocation failed: timeout after 60s). (elrond)
- Handover-watchdog: planner-pending nudge for phase 2 (auto-invocation failed: timeout after 60s). (elrond)
- Handover-watchdog: planner-pending nudge for phase 2 (auto-invocation failed: timeout after 60s). (elrond)
- PHASE C cutover model = OPTION A (Ninad): ALL initiatives-DB access (read+write) routes through the elrond relay; the file moves to /home/elrond owned 0600 elrond, so caroladmin has ZERO direct access (purest boundary). WAL is a non-issue (only the daemon, in elronds own dir, touches db+wal+shm). Requires a global sqlite3.connect interceptor so every reader routes through the relay too. Relay core proven: 8 functional + 5 integration cases pass; flag-off neutral (confirmed vs baseline; test_ini729 flakiness is shared-DB order-dependence, passes in isolation). (orion)
- PHASE C CUTOVER LIVE: initiatives DB moved to elrond 0600; app writes/reads now flow through the elrond relay daemon. (orion)
- PHASE C CUTOVER COMPLETE + VERIFIED: initiatives DB moved to /home/elrond 0600 elrond; Elronds writer daemon (carol-initiatives-writer) is the SOLE accessor; all processes route through it via the global sqlite interceptor + sentinel; full stack restarted fresh (apps+Elrond+Radagast+Carol); boundary holds (caroladmin cannot open the file - OperationalError); zero stale fds; Radagast/monitor/filing all work through the relay; no casualties (0300-52 was already blocked). (orion)
- RESIDUAL (now ENFORCEABLE): the relay is transport, not authorizer - the daemon currently serves any caroladmin callers write. Full only-Elrond-FILES semantics = add filing authorization INSIDE the elrond daemon (reject initiative INSERTs not from Elronds Creator). Now trivially enforceable because the chokepoint sits outside caroladmins trust domain. Tracked as the final refinement. (orion)
- [status-router] active -> reviewing | event=bypass_reviewing | bypass transition (or-bx-01)
- Carolverse Security Strategy v1 authored (carol-vm docs/security): clerk=per-surface owning-agent daemon (not Radagast for all); Radagast=CISO/identity owner+backup; Albus=security architecture; Themis=independent audit; central framework + federated per-service implementation; org-wide scope; session-per-action auth model recommended over literal continuous login/logout. (orion)
- Orion remediated: INI-999900124 bypass closed — CAROL-INI-696 close-marker: the Orion bypass INI-999900124 filed against this parent reached terminal state (closed). This row's literal prefix Orion remediated: is the canonical signal the cookbook-155 dispatcher gate looks for. (shared.bypass.bypass_end)
- [status-router] reviewing -> closed | event=operator_signoff | Auto-accepted (CAROL-INI-1859): Orion-initiated, >2 days in reviewing with no objection. (el-srac-01)
✅Success criteria
- A security service exists in the registry, in Midas catalogue, and as a Carolopedia page (must_have)
- A user-management block exists within the security service with per-agent owner-daemons + verified request queues (must_have)
- Every registered agent has a dedicated locked OS user; privileged writes route through the owning agents process; Radagast is backup admin (must_have)
- Only Elrond can FILE an initiative or attempt; any agent may REQUEST; the boundary is OS-enforced not an overridable in-DB trigger (must_have)
- A documented Carolverse user-management framework design exists and is aligned to security policy (must_have)
- A new security policy for user access management and isolation is added to the policy registry (must_have)
- A reusable add-new-service skill/template exists (must_have)
- Cookbook, SST, design, and requirements are updated; services exist in requirements; runbook states only Elrond files initiatives (must_have)
- An Orion Logbook blog explains how agent logins enable ownership, accountability, traceability, auditability, consistency, proficiency (must_have)