Carolopedia

A friendly guide to Carol, her ecosystem, and the agents who built her.

📖 Carolopedia › Services › Build Initiatives › All activities › INI-999900317

📋

CAROL-INI-2084-00: Infrastructure RSI v1: CPU-usage metric loop (collector + scoreboard + dashboard + improvement engine), cloned from the Initiatives RSI templates

Initiative

📖About

Bring recursive self-improvement (RSI) to the Infrastructure service, owned by Hagrid, using the EXACT same templates as the Pipeline/Initiatives RSI (CAROL-INI-2066): a persistent scoreboard store, a daily collector droid that writes a dated snapshot, a thin dashboard window that reads the snapshot, and a daily improvement engine that files+dispatches+tags a planner-mode improvement initiative for any metric below target. Start with ONE metric: CPU usage. Target: CPU below 50 percent at least 99 percent of the time. Score out of 10 = 10 x (fraction of samples below 50 percent) per week; target 10; the improvement engine files when the latest weekly score is below 9.9 (i.e. under the 99 percent bar). CPU samples are recorded every 15 minutes by Hagrid's Resource Sentinel into the infra RSI store. Mirrors the Initiatives RSI file-by-file: a Hagrid metric module + scoreboard, shared shim, collector droid, improvement engine droid (files through the Author, tags infra-rsi + rsi-metric:cpu, dedups, dispatches), and a dashboard app at /dev/infra-rsi/. All Hagrid-owned, registered, scheduled, run-audited; extensible to add memory/disk metrics later one at a time.

⚖️Decisions

Elrond's bypass methodology checklist (a reminder, not a gate -- you've got this): 0. File it requested_mode='bypass' (planner-vs-bypass is a deliberate choice). bypass_start REFUSES a non-bypass initiative (CAROL-INI-1846), and the dispatcher only skips the bypass lane when the mode says bypass -- a 'planner' mistag lets Merlin's pipeline grab the placeholder step and block your finished work. 1. Filed as planned status -- let the bypass claim/activate it; never file active. 2. Open the bypass (bypass_start) with your droid id + the remediation answer (remediates_initiative_id=NNN, or remediates_nothing=True). 3. Work the blocks for your work-type: template -> design -> code -> test -> review. Do the real work; record decisions on the initiative as you make them. 4. Reality is recorded for you at close -- code (files changed), each decision, and the twin-review verdict become real activities tied to this initiative and show in the Activity Tracker like a planner run (CAROL-INI-1840). No dummy rows. 5. Keep the initiative status moving; it parks in 'reviewing' and is tagged uat-pending for you at close (CAROL-INI-1836), so the stuck-watchdog leaves it alone until UAT. 6. Close runs the gates (design/architecture compliance + caller-audit). If a gate flags something pre-existing or unrelated to your change, waive it with a clear written rationale -- audit, don't skip. 7. Bypass skips the planner's auto-orchestration, NOT the standards. Same template checklist, same review, same observability as a planner run. (elrond)
[status-router] planned -> executing | event=bypass_executing | bypass transition (or-bx-01)
Cloned the Initiatives/Pipeline RSI templates file-for-file for Infrastructure (owner Hagrid): metric module + persistent scoreboard (data/infra_rsi.db), shared shim, daily collector, daily improvement engine, and a thin dashboard app — same shapes as Elrond's RSI. (orion)
v1 metric = CPU usage: score out of 10 = weekly share of CPU samples below 50 percent; target below 50 percent at least 99 percent of the time; the improvement engine files when the weekly score drops below 9.9/10. (orion)
CPU samples recorded every 15 min by Hagrid's Resource Sentinel into the infra RSI store; collector at 02:35 writes the daily snapshot, engine at 02:50 files+tags+dispatches a planner-mode fix through the Author for any below-target metric (dedup on rsi-metric:cpu). All registered + scheduled + run-audited. (orion)
Dashboard at /dev/infra-rsi/ (port 7262, app infra-rsi, owner Hagrid) is a thin window reading the stored snapshot via shared/infra_metrics.py. Validated end-to-end: sample->score 10/10, collector snapshot written, engine dry-run files nothing at target. Extensible to memory/disk metrics by adding entries to METRICS. (orion)
[status-router] executing -> reviewing | event=bypass_reviewing | bypass transition (or-bx-01)
UAT fix (Ninad found empty dashboard): scheduled Sentinel runs completed but recorded no CPU sample — Hermione's Scheduler invokes the entrypoint with a python lacking the dev path, so the sampler import (agents.agt_015...) failed silently. Pinned sys.path to the dev root at the top of the Sentinel; verified system-python now records a sample. Going forward ~96 samples/day accumulate. NOTE: no historical CPU exists to backfill — Radagast's vitals droid only alerts on live spikes and never stored readings — so the weekly chart legitimately starts with one point this week and fills one point per week. (orion)
Orion remediated: INI-999900319 bypass closed — CAROL-INI-696 close-marker: the Orion bypass INI-999900319 filed against this parent reached terminal state (closed). This row's literal prefix Orion remediated: is the canonical signal the cookbook-155 dispatcher gate looks for. (shared.bypass.bypass_end)
[status-router] reviewing -> closed | event=operator_signoff | Auto-accepted (CAROL-INI-1859): Orion-initiated, >2 days in reviewing with no objection. (el-srac-01)

✅Success criteria

An infra RSI scoreboard store persists a dated weekly snapshot of the CPU metric (score out of 10 = fraction of samples below 50 percent), cloned from the Initiatives RSI scoreboard. (must_have)
CPU samples are recorded automatically every 15 minutes into the infra RSI store. (must_have)
A daily collector droid (Hagrid) computes and writes the snapshot; the dashboard and engine read the SAME stored snapshot. (must_have)
A dashboard at /dev/infra-rsi/ is a thin window that reads the stored snapshot via a shared shim (computation lives in Hagrid's metric droid). (must_have)
A daily improvement engine (Hagrid) files a planner-mode improvement initiative through the Author when the CPU score is below the 99 percent target (9.9/10), tags it infra-rsi + rsi-metric:cpu, dedups on the open tag, and dispatches it. (must_have)
All new droids and the app are registered, scheduled durably, and emit run-audit; the design mirrors the Initiatives RSI templates file-by-file. (must_have)

Sourced live from the initiatives ledger · initiative 999900317