Operating the intelligence loop
Operational reference for the intelligence service. See the concept for the model and the architecture for the build. Internal / M2M only; dark by default (stage S0).
Cadence — how the loop turns
The Loop Orchestrator (services/workflow → Inngest intelligence.loop.v1) calls POST /v1/loop/tick on a schedule. Each tick runs Monitor → Analyze → Propose:
- Monitor — sweep
pendingdecisions past their maturation deadline →expired_unjoined(closes the join window). - Analyze — compute + persist meta-metrics.
- Propose — when a meta-metric trips its threshold, raise a
loop_proposal+ publishintelligence.proposal.raised.v1(governance consumes it).
Manual tick (admin M2M): POST /v1/loop/tick. Over an empty ledger it no-ops.
Gate dashboards — the turn-on decision is data
GET /v1/loop/meta-metrics is the source; the authoritative metric/query/threshold definitions live in services/intelligence/docs/gate-dashboards.md. Watch:
| Metric | Healthy | Why it matters |
|---|---|---|
| join_completeness | ≥ 0.8 | below → outcomes are being dropped → lift biased optimistically (the worst outcomes vanish) |
| loop_liveness | stalled = false | a stalled loop looks identical to a healthy converged one — alarm on it |
| realized_vs_predicted / calibration / lift_decay | per policy version | the model’s honesty + decay; the S2→S5 gate signals |
SELECT metric, value, details, computed_at
FROM intelligence.meta_metrics
WHERE metric IN ('join_completeness','loop_liveness')
ORDER BY computed_at DESC LIMIT 2;Alarms
| Alarm (log metric) | Trigger | Action |
|---|---|---|
loop liveness alarm | orchestrator dead / join-rate collapse / no proposals in N weeks | check the workflow + intelligence logs; run /v1/loop/tick manually |
| join-completeness below floor | join_completeness < 0.8 (with closed outcomes) | check intelligence-on-* EventBridge subscriptions |
intelligence_guardrail_blocked | a served rec was blocked by guardrails | review the violation types; expected for unsafe content, investigate spikes |
Governance — the human gate
The orchestrator only proposes. A proposal changes what’s served only when a human applies it.
GET /v1/governance/proposals?status=raised # the review queue
POST /v1/governance/proposals/{id}/approve # { reviewer_id, notes? }
POST /v1/governance/proposals/{id}/reject # { reviewer_id, notes? }
POST /v1/governance/proposals/{id}/apply # { version } → promote that registry versionLifecycle: raised → approved | rejected → applied. Apply promotes a registry version to champion — the only path a proposal reaches members, always explicit + audited. The model never auto-publishes.
Registry promote / rollback
POST /v1/registry/{id}/promote # make a version champion (retires the prior)
POST /v1/registry/rollback # { policy_kind, policy_key, brand_id, to_version }Rollback re-promotes a prior version. It preserves provenance — historical decision_records.policy_version / registry_id are never rewritten, so the Outcome Join keeps attributing correctly.
Turning a stage on (the ★ pathway)
feature.intelligence.stage (default S0) selects the ranker. Advance only when the gate is met (see the gate dashboards); roll back instantly by flipping the flag back. Every stage S0→S5 uses the same /rank contract — only the ranker changes.
PHI + erasure
The ledger + feature store hold PHI links once the clinical track is live. Cross-service reads + the platform_readonly role use the *_safe views. On user.erasure_requested.v1 the service deletes the entity’s decisions (outcomes cascade) + feature vectors, audited. Model-artifact erasure is handled when a model is trained.
Health
GET /healthz # process up
GET /readyz # DB + bus reachableDeeper runbooks: services/intelligence/RUNBOOK.md + docs/runbooks/.