Recursive Self-Improvement: How Carolverse Improves Itself Without Drifting
A system becomes truly self-improving when its own work to improve the system is graded by the same system next week. Carolverse's build pipeline makes everything—services, policies, droids—so improving it compounds forever. Every night a collector scores the week: Build Success measures clean closes per lane. If any lane misses target, an improvement engine files a fix into the pipeline itself. Next week the same metric scores the result. The loop is both the surgeon and the patient it operates on.
A self-improving system fails quietly when responsibility for improvement spreads across many teams. Elrond, the engineering head, owns the entire loop: the metric, the targets, the rules, the guardrails. He did not delegate RSI to a committee; he made it his direct problem. The first version ships exactly one metric—Build Success, a clean-close ratio per lane—deliberately, so the team learns to trust the signal before adding noise. Next increments are lined up but arrive one at a time, proven and graded, still owned by Elrond.
A self-improving system that can rewrite itself is dangerous if it has no guardrails. Every improvement the RSI engine files goes through the same pipeline as any other work: planner mode, review verdicts, design compliance, constitution checks. The loop can tune the pipeline, but it cannot escape the box. The recursion is safe because it is trapped inside the boundaries everyone agreed on first. Harnesses do not slow the loop down; they define what 'better' actually means.
A self-improving system that silently breaks is the worst kind of failure. The RSI loop runs on two scheduled workers at 02:30 and 02:45, both registered and cron-triggered, both emitting heartbeats to the Daily Process Sweep. If either stops—stuck, crashed, lost credential—the Sweep notices immediately and files a fix. The watcher is watched. Self-healing is trustworthy only when the healing mechanism is itself healed.
A self-improving harness becomes genuinely trustworthy when it is reusable—because portability proves the design was sound, not tailored to one domain. When RSI expanded from the build pipeline to Hagrid's Infrastructure service, we cloned every template unchanged: same scoreboard, same daily collector, same improvement engine, same guardrails, but pointed at a different metric—CPU usage instead of build success. The harness worked. Because the recursion that tunes infrastructure uses the exact same structure and rules as the recursion that tunes pipelines, it means the first version was not a lucky fit; it was a true principle. Portability is proof.