Carol — back to Apps ← Apps

Carolopedia

A friendly guide to Carol, her ecosystem, and the agents who built her.

📖 CarolopediaServicesBuild InitiativesAll activitiesINI-999900557
📋

CAROL-INI-2287-00: RSI engine cure-rate measure — activate Blocked-recovery under quality management on the Quality Scorecard and RSI Dashboard

Initiative
Open in Initiatives →

📖About

The RSI engine has no performance measure: nothing tracks whether its diagnose-and-retrigger cycles actually CURE (family stays unblocked) vs churn. Ninad 2026-07-03: measure = cure rate (share of diagnosed-and-retriggered families that stay unblocked), weekly like every other scorecard measure, target 100%, tracked by Prometheus under the quality-management service exactly like his other measures. Implementation activates the already-defined-but-never-wired Blocked-recovery metric: new shared metric module (weekly ISO series from rsi-retriggered tags + status-router block events, 7-day maturity rule, carry-forward, snapshots into Prometheus quality store), collected daily by Prometheus Maturity Assessor, resolver branch in the metric catalogue (mapping moved from initiatives/planned to quality-management/active, target 10.0), trend tab on the ONE RSI Dashboard via the hub adapter. Recorded decision: Ninad explicitly places this measure under quality-management, superseding the cross-domain owner-service convention for this metric.

⚖️Decisions

  • Elrond's bypass methodology checklist (a reminder, not a gate -- you've got this): 0. File it requested_mode='bypass' (planner-vs-bypass is a deliberate choice). bypass_start REFUSES a non-bypass initiative (CAROL-INI-1846), and the dispatcher only skips the bypass lane when the mode says bypass -- a 'planner' mistag lets Merlin's pipeline grab the placeholder step and block your finished work. 1. Filed as planned status -- let the bypass claim/activate it; never file active. 2. Open the bypass (bypass_start) with your droid id + the remediation answer (remediates_initiative_id=NNN, or remediates_nothing=True). 3. Work the blocks for your work-type: template -> design -> code -> test -> review. Do the real work; record decisions on the initiative as you make them. 4. Reality is recorded for you at close -- code (files changed), each decision, and the twin-review verdict become real activities tied to this initiative and show in the Activity Tracker like a planner run (CAROL-INI-1840). No dummy rows. 5. Keep the initiative status moving; it parks in 'reviewing' and is tagged uat-pending for you at close (CAROL-INI-1836), so the stuck-watchdog leaves it alone until UAT. 6. Close runs the gates (design/architecture compliance + caller-audit). If a gate flags something pre-existing or unrelated to your change, waive it with a clear written rationale -- audit, don't skip. 7. Bypass skips the planner's auto-orchestration, NOT the standards. Same template checklist, same review, same observability as a planner run. (elrond)
  • [status-router] planned -> executing | event=bypass_executing | bypass transition (or-bx-01)
  • Measure definition locked (Ninad 2026-07-03): RSI engine performance = cure rate (share of diagnosed-and-retriggered families that stay unblocked, 7-day maturity, weekly, carry-forward), target 100%, tracked by Prometheus under quality-management on the Quality Scorecard + RSI Dashboard. Baseline at activation: 0.0/10 (0 cured, 13 re-blocked, 7 in-flight) — the pre-fix era; the post-upgrade cohorts mature from 2026-07-10. (orion)
  • [status-router] executing -> reviewing | event=bypass_reviewing | bypass transition (or-bx-01)
  • UAT comment (Ninad 2026-07-03): the trend bar chart must show the count of diagnosis initiatives per week INCLUDING meta/retrospective diagnoses (was showing nothing — counts were plain ints where the template expects [x, volume] pairs, so bars summed to 0); bar axis = 200% of the max weekly diagnosis count; score stays on the line chart; target 100% approved. (orion)
  • UAT fix applied same-day: weekly diagnosis counter (rsi-diagnosis + rsi-meta-diagnosis tags) added to the metric module; hub counts now [cured, diagnoses] pairs (template reads volume from c[1] and already scales the axis to 2x max); volume label 'diagnoses / wk'; bar-click day drilldown added. Live-verified: 25 diagnoses this week, axis max 50, drilldown 12 Thu + 13 Fri. (orion)
  • [status-router] reviewing -> blocked | event=operator_put | PUT /api/initiatives (operator)
  • [status-router] blocked -> executing | event=operator_reopen | Ninad direct order 2026-07-03: extend CAROL-INI-2287 — change improvement counting to LIVE unblocked families (drop 7-day maturity). Also note: this initiative was itself phantom-parked to blocked by an operator_put rescue-sweep re-grade at 13:07. (orion)
  • Counting changed to LIVE (Ninad direct order 2026-07-03): A=diagnosis initiatives (rsi-diagnosis+meta tags), B=diagnosed FAMILIES (canonical title-prefix grouping; chain_root unpopulated) with NO member currently blocked/diagnosis. Score=B/families*10, live; re-block decrements immediately. 7-day maturity gate REMOVED. (orion)
  • First live read: 26 diagnosis initiatives, 20 diagnosed families, 18 unblocked now, 2 still blocked -> score 9.0. Snapshot written to the quality store; catalogue text updated; RSI Dashboard + Quality Scorecard apps relaunched (new PIDs, 200). (orion)
  • [status-router] executing -> reviewing | event=bypass_reviewing | bypass transition (or-bx-01)
  • [status-router] reviewing -> blocked | event=operator_put | PUT /api/initiatives (operator)
  • Orion remediated: Albus RSI group diagnosis (via INI 999900063): [procedural, confidence medium] The initiative was blocked by an operator PUT /api/initiatives after multiple Elrond re-scopings of success criterion 1 (five times in two days) with no final status transition to uat-pending, creating confusion about whether the work was complete and ready for UAT. (orion)
  • [status-router] blocked -> closed | event=operator_put | PUT /api/initiatives (operator)
  • [rsi-group-cure] Cured by the group diagnosis on INI 999900063 (shared cause operator_put); retriggered as INI 999900630. Root cause: [procedural, confidence medium] The initiative was blocked by an operator PUT /api/initiatives after multiple Elrond re-scopings of success criterion 1 (five times in two days) with no final status transition to uat-pending, creating confusion about whether the work was complete and ready for UAT. (elrond.rsi_loop)

Success criteria

  • Prometheus Quality Scorecard shows the RSI cure-rate measure under the quality-management service as ACTIVE with a live weekly value and the 10.0 target — same look and mechanics as his other measures (must_have)
  • The RSI Dashboard shows the cure-rate trend as a tab (10-week series, identical layout to the other RSI measures), deep-linked from the Scorecard row (must_have)
  • The measure updates itself daily without operator involvement: Prometheus Maturity Assessor writes a dated cure-rate snapshot on its scheduled run (run-audit visible) (must_have)
  • The computed cure rate matches a hand-verified count of diagnosed-retriggered families cured vs re-blocked for the current week (spot check recorded as a decision) (must_have)
  • Regression suite shows zero new failures after the change (must_have)
  • Cure-rate metric counts LIVE: A=count of RSI diagnosis initiatives, B=count of initiatives (families) unblocked due to RSI remediations; score=B/A-families*10; NO 7-day maturity gate (must_have)
  • Re-blocked families decrement the unblocked count immediately (re-block treated as a bug; it re-enters the RSI loop) (must_have)
  • Quality Scorecard catalogue text and RSI Dashboard reflect the live counting (must_have)