Intelligence layer — the self-improvement loop

What this is: the intelligence service and its closed loop. Not “AI” — a closed, governed, evidence-grounded loop that learns from measured outcomes: serve a recommendation, observe what actually happened, measure realized-vs-predicted, propose a change when the loop detects decay, and promote only what wins a governed A/B. The system’s own output becomes its next input, so it improves itself.

Who it’s for: anyone integrating a recommendation surface (storefronts, loop-health, admin) and anyone reasoning about how the platform turns served decisions into a compounding moat.

What to read next: Phenome — the evidence layer · Sequential learning → Helix · AI & ML layer · Events · Audit & PHI.

Framing: externally this is “measured outcomes — we can show it worked,” not “AI/ML.” No diagnostic or treatment claims; recommendations are clinician-reviewable. See the spec’s regulatory posture.

The loop

The intelligence components are not nine services — they are stages of one cybernetic loop (variation → selection → retention). The same loop is instantiated twice: commerce now (data-backed on real orders) proves the machinery is safe; clinical runs the same loop once members are proven to re-test.

The closure primitives (what makes it a loop, not a pipeline)

#	Primitive	Where (SH-0)
1	Decision Record — every served rec logged with provenance + predicted effect + propensity	`decision_records`
2	Outcome Join — the durable, event-driven edge attributing a later outcome back to its decision	`outcome_records` + EventBridge handlers
3	Loop Orchestrator — an Inngest workflow running Monitor→Analyze→Propose on a cadence	`intelligence.loop.v1` → `POST /v1/loop/tick`
4	Exploration policy — a Thompson sampler keeps the loop generating signal instead of converging (within-arm ranking only; never adapts A/B allocation)	`exploration.ts`
5	Meta-metrics — realized-vs-predicted, calibration, lift-decay, join-completeness, loop-liveness	`meta_metrics`

Maturity ladder (each rung reversible)

L1 self-observing (outcomes captured + attributed) → L2 self-proposing (the loop raises proposals on decay/drift; human approves) → L3 self-acting, bounded (low-risk commerce retrains→shadow→A/B→promote with auto-rollback; clinical stays human-gated) → L4 self-meta-improving (the loop tunes its own exploration + retires stale rules).

Two shared objects anchor everything by version: the Decision+Outcome ledger and the model/rule registry. Provenance flows end-to-end — rolling back a model version never rewrites the version stamped on historical decisions, so the Outcome Join keeps attributing correctly.

Why the join is durable, not fire-and-forget

A silently-dropped outcome biases lift optimistically (the worst outcomes just vanish). So every Decision Record carries outcome_status ∈ {pending, joined, expired_unjoined}; a sweep closes the window; and join-completeness % is a first-class meta-metric with an alarm. A stalled loop looks identical to a healthy converged one — so loop-liveness is alarmed too.

The activation pathway — how rules become a model

The model is not needed day one. Every recommendation flows through the same serving contract at every stage; what changes is who computes the ranking and how much trust we give it. A stage advances only when its gate is met, and any stage rolls back instantly via a flag.

Stage	Who ranks	Gate to advance
S0 · Rules	curated rules	—
S1 · Rails live	rules	outcome + feature records capturing cleanly
S2 · Lift-ranked	rules re-ordered by observed lift	≥ N paired outcomes; lift stable; governance-approved
S3 · Model shadow	rules serve; model computes in parallel, logged not shown	model passes offline eval; shadow agreement tracked
S4 · A/B	model for an allocated %, rules for the rest	model beats rules on the primary metric; guardrails clean; governance-promoted
S5 · Model live	model (governed)	sustained win + safety review

Human-in-the-loop is permanent: S2+ rank changes pass governance; S4/S5 promotions are explicitly approved and logged. The model never auto-publishes clinical advice.

Integrating

Two-sided and low-friction:

Pull — an app asks for a ranking via POST /v1/recommendations/rank (the single serving contract, exposed through @platform/sdk-intelligence, M2M, internal). The ranker behind it changes across S0→S5 (selected by feature.intelligence.stage, default S0); the API does not. Each call logs a Decision Record with propensity.
Learn — apps need no new producer code. The Outcome Join consumes domain events apps already emit (order.placed.v1, clinical.biomarker.added.v1, …). Emitting your normal events is all it takes for the loop to learn from your surface.

Decision logging is internal: the serving call writes the Decision Record (with propensity); callers just pass context. Public/partner access is out of scope.

The evidence layer on top

Once the loop is logging real decisions and outcomes, Phenome is the layer that turns those measured outcomes into publishable, verifiable evidence — a published methodology Standard, exposure/COA verification, and the consent rails that let third parties contribute. The far horizon above the uplift gate — sequential decision-making and the in-silico Helix twin — is mapped in Sequential learning → Helix.

AI & ML layer Phenome — the evidence layer