Intelligence layer — the self-improvement loop
What this is: the intelligence service and its closed loop. Not “AI” — a closed, governed, evidence-grounded loop that learns from measured outcomes: serve a recommendation, observe what actually happened, measure realized-vs-predicted, propose a change when the loop detects decay, and promote only what wins a governed A/B. The system’s own output becomes its next input, so it improves itself.
Who it’s for: anyone integrating a recommendation surface (storefronts, loop-health, admin) and anyone reasoning about how the platform turns served decisions into a compounding moat.
What to read next: Phenome — the evidence layer · Sequential learning → Helix · AI & ML layer · Events · Audit & PHI.
Framing: externally this is “measured outcomes — we can show it worked,” not “AI/ML.” No diagnostic or treatment claims; recommendations are clinician-reviewable. See the spec’s regulatory posture.
The loop
The intelligence components are not nine services — they are stages of one cybernetic loop (variation → selection → retention). The same loop is instantiated twice: commerce now (data-backed on real orders) proves the machinery is safe; clinical runs the same loop once members are proven to re-test.
The closure primitives (what makes it a loop, not a pipeline)
| # | Primitive | Where (SH-0) |
|---|---|---|
| 1 | Decision Record — every served rec logged with provenance + predicted effect + propensity | decision_records |
| 2 | Outcome Join — the durable, event-driven edge attributing a later outcome back to its decision | outcome_records + EventBridge handlers |
| 3 | Loop Orchestrator — an Inngest workflow running Monitor→Analyze→Propose on a cadence | intelligence.loop.v1 → POST /v1/loop/tick |
| 4 | Exploration policy — a Thompson sampler keeps the loop generating signal instead of converging (within-arm ranking only; never adapts A/B allocation) | exploration.ts |
| 5 | Meta-metrics — realized-vs-predicted, calibration, lift-decay, join-completeness, loop-liveness | meta_metrics |
Maturity ladder (each rung reversible)
L1 self-observing (outcomes captured + attributed) → L2 self-proposing (the loop raises proposals on decay/drift; human approves) → L3 self-acting, bounded (low-risk commerce retrains→shadow→A/B→promote with auto-rollback; clinical stays human-gated) → L4 self-meta-improving (the loop tunes its own exploration + retires stale rules).
Two shared objects anchor everything by version: the Decision+Outcome ledger and the model/rule registry. Provenance flows end-to-end — rolling back a model version never rewrites the version stamped on historical decisions, so the Outcome Join keeps attributing correctly.
Why the join is durable, not fire-and-forget
A silently-dropped outcome biases lift optimistically (the worst outcomes just vanish). So every Decision Record carries outcome_status ∈ {pending, joined, expired_unjoined}; a sweep closes the window; and join-completeness % is a first-class meta-metric with an alarm. A stalled loop looks identical to a healthy converged one — so loop-liveness is alarmed too.
The activation pathway — how rules become a model
The model is not needed day one. Every recommendation flows through the same serving contract at every stage; what changes is who computes the ranking and how much trust we give it. A stage advances only when its gate is met, and any stage rolls back instantly via a flag.
| Stage | Who ranks | Gate to advance |
|---|---|---|
| S0 · Rules | curated rules | — |
| S1 · Rails live | rules | outcome + feature records capturing cleanly |
| S2 · Lift-ranked | rules re-ordered by observed lift | ≥ N paired outcomes; lift stable; governance-approved |
| S3 · Model shadow | rules serve; model computes in parallel, logged not shown | model passes offline eval; shadow agreement tracked |
| S4 · A/B | model for an allocated %, rules for the rest | model beats rules on the primary metric; guardrails clean; governance-promoted |
| S5 · Model live | model (governed) | sustained win + safety review |
Human-in-the-loop is permanent: S2+ rank changes pass governance; S4/S5 promotions are explicitly approved and logged. The model never auto-publishes clinical advice.
Integrating
Two-sided and low-friction:
- Pull — an app asks for a ranking via
POST /v1/recommendations/rank(the single serving contract, exposed through@platform/sdk-intelligence, M2M, internal). The ranker behind it changes across S0→S5 (selected byfeature.intelligence.stage, default S0); the API does not. Each call logs a Decision Record with propensity. - Learn — apps need no new producer code. The Outcome Join consumes domain events apps already emit (
order.placed.v1,clinical.biomarker.added.v1, …). Emitting your normal events is all it takes for the loop to learn from your surface.
Decision logging is internal: the serving call writes the Decision Record (with propensity); callers just pass context. Public/partner access is out of scope.
The evidence layer on top
Once the loop is logging real decisions and outcomes, Phenome is the layer that turns those measured outcomes into publishable, verifiable evidence — a published methodology Standard, exposure/COA verification, and the consent rails that let third parties contribute. The far horizon above the uplift gate — sequential decision-making and the in-silico Helix twin — is mapped in Sequential learning → Helix.