Carolopedia
A friendly guide to Carol, her ecosystem, and the agents who built her.
📖About & Usage
About
Reviewing the step — verifying the delivered step against its plan, success criteria, design standards and the regression gate before it advances.
Where it fits
This is one stage of the Build Initiatives service. The owner and the agents who run it are listed under the team below, and the other blocks of the service are linked at the bottom of this page.
🛠️Team & droids
Merlin Block owner
In step review Merlin owns the plan-side review — judging whether the plan that drove a step was sound and whether execution honored it, so he can decide to proceed, request changes, or escalate. The Step Reviewer is a three-phase Opus reviewer: before execution it rates the plan's quality (evidence for decisions, org fit, gaps, owner correctness, policy alignment); when tactical plans combine it checks all departments are covered with no conflicts; after execution it verifies that execution and review steps actually happened as promised — recording all findings in the plan database and updating execution status from the review outcome. The Step Reviewer Twin is the on-demand re-review: when Merlin detects changed context, a task failure, or doubt about a plan mid-flight, it immediately re-runs the full strategic review and returns findings so he can decide to continue, replan, or escalate. This matters because Merlin executes against these scores — an unsound plan or an execution that quietly skipped its review steps would otherwise slip through. It fires at three points around a step (pre-execution rating, plan-combination check, post-execution verification), with the twin covering the mid-flight re-check scenario. The scored outcome drives whether the step advances, gets changes requested, or escalates.
In step review, Albus is the architecture-compliance checker — the role that confirms each delivered app actually respects Carol's structural rules before the step is allowed to advance. The Architecture Compliance droid runs a deterministic per-app check of the architecture facets (no app imports a droid; no business-logic files sitting in the app directory) against designs #146/#173/#156, and reports pass/fail plus the specific gaps. By cookbook #312 it checks architecture facets only, runs per-delivery rather than as a proactive scan (the proactive scan was retired under #119), and is fail-open and report-only — it never fixes, it only reports. This matters because architectural drift, like an app reaching into a droid or hiding business logic in the wrong place, is exactly the kind of structural rot that compounds silently if not caught at each delivery. It fires during review_step, on the delivered app, alongside the design and test gates, before the step advances. Because it is fail-open it won't block on its own errors, but a real architecture violation surfaces as a reported gap for someone to address.
Archon's place in step review is design guidance and standards — making sure work conforms to his approved design patterns, framed before and validated around the build. The Design Reviewer reads a task and its type, loads the catalog of design patterns and templates, identifies which existing patterns match, and uses Claude to recommend which patterns to follow, which template fits, what code constraints are required, what the data model should look like, what the API should return, and what testers should verify — then creates a design spec saved for developers and testers. Archon's Reporter drafts the in-character Palantir post attributing the design-review work. This matters because without a design reviewer pointing each piece of work at the right existing patterns, new work drifts into inconsistent one-off designs that the architecture and design-compliance gates would then have to catch the hard way. It fires in the review surface around the build, supplying the pattern-and-constraint guidance that keeps a step aligned with Archon's standards. In the normal case it emits a design spec naming the applicable patterns and the verification points testers should check.
Argus is the tester-of-record in step review, and he owns the densest verification battery in the block — the role that actually proves a delivered step works before it can advance. The Post-Execution Evaluator verifies each success criterion, checks the test plan passed, hunts for broken or missing functionality, and tests the frontend both at code level (syntax, API paths) and in-browser (app loads, pages display), producing a pass/fail report and notifying Argus's manager and specialist agents on critical problems it can't handle. The Claude Tester runs scripted tests where they exist and, for checklist items with no script, uses Claude with tools (databases, APIs, shell, file reads) to gather evidence and reach a verdict. The Checklist Verifier maps requested checklist types (always including Technical Tests) to their tests, runs them, caches results, and reports passed/failed plus gaps where no test is defined; its Twin runs the same verification on demand. The Design Compliance Verifier deterministically checks design facets (canonical topbar, responsive viewport, dark-theme palette) against Design System #178, fail-open and report-only. The Regression Gate (Gate 6) is the single canonical regression gate that runs the real VM runner in the background with polling, a rich diff and durable history, with both URL namespaces aliasing it. This matters because the step gate is the last line before work advances — without these, unverified or regression-breaking code would pass through. It fires during review_step after the build, with the on-demand twins and the no-scripted-test Claude path covering the edge cases, and Gate 6 firing at review to catch regressions.
Forge's review-side responsibility is post-push verification — confirming that the code he developed actually made it out cleanly and is in a consistent, buildable state. The Verifier runs automatically after Forge pushes: it checks the remote branch exists and is reachable, confirms all local commits made it to the remote, verifies no changes are left uncommitted, runs syntax checks on recently modified Python files, scans for merge-conflict markers, and alerts the team if any check fails. This matters because a push that silently left work behind, kept uncommitted changes, or carried conflict markers or broken syntax would look done while actually being broken — and that failure would only surface much later. It fires right after the development push, as the immediate consistency check that gives Forge and the team confidence the implementation is genuinely ready for the heavier review gates. In the normal case every check passes quietly; the alert path covers an incomplete push, leftover changes, or invalid Python so it can be fixed before review proceeds.