Intelligence layer architecture

The intelligence layer is one bounded context — the intelligence service — plus thin, well-defined touchpoints on ai, clinical, and workflow. Read the concept for the loop; this is the build.

Services + their roles

Service	Role in the loop
`intelligence`	Owns the loop: the Decision+Outcome ledger, the model/rule registry, the feature store, meta-metrics, governance, the serving contract, the guardrail gate, and the exploration policy.
`workflow`	Hosts the Loop Orchestrator Inngest workflow (`intelligence.loop.v1`) that calls `POST /v1/loop/tick` on a cadence.
`ai`	Embeddings + RAG (research-paper ingestion + retrieval) used to ground rationales. Boot-asserts a real embedding provider in staging/prod.
`clinical`	Emits `clinical.biomarker.added.v1` (the clinical feedback wire) and owns contraindication logic the guardrail gate can call.

Cross-service only via HTTP/events — never shared DB access.

The two shared objects (the unifying substrate)

Everything reads/writes versioned references to exactly two objects, which is what stops the system living disjointed:

Decision + Outcome ledger — decision_records (every served rec, with provenance + propensity + outcome_status) + outcome_records (the durable join edge).
Model/rule registry — model_rule_registry (versioned policies: shadow | candidate | champion | retired). The registry is the source of truth; the S3 bucket holds model bytes.

The feature store (feature_vectors) is derived from the same ledger + sources. Provenance flows end-to-end; rollback never rewrites the version stamped on historical decisions.

The data plane (`intelligence.*` schema)

Table	Purpose	PHI-safe view
`decision_records`	served recs + provenance + propensity + `outcome_status`	`decision_records_safe` (omits features/ranked payload)
`outcome_records`	attributed outcomes	`outcome_records_safe` (omits realized payload)
`model_rule_registry`	versioned policies; one champion per (kind,key,brand)	—
`feature_vectors`	point-in-time entity features (no label leakage)	`feature_vectors_safe` (omits features payload)
`meta_metrics`	the loop’s self-perception	—
`loop_proposals`	proposals + governance review lifecycle	—
`audit_log`	every state change (append-only, ADR-0039)	—

PHI: the ledger + feature store hold PHI links once the clinical track is live → safe-views per ADR-0046, audit on every write, and erasure (user.erasure_requested.v1 deletes decisions → outcomes cascade + feature vectors).

Serving + the gates

/v1/recommendations/rank → stage flag picks the ranker → guardrail gate (drops unsafe recs) → log a Decision Record → return. At S2+ rank changes pass governance; at S3+ the exploration policy (Thompson sampler) adds bounded within-arm exploration. Guardrails + governance + exploration are all in the intelligence service.

SST topology (per ADR-0094)

Reuses the shared VPC / Cluster / Aurora / Bus — net-new primitives only:

sst.aws.Service (intelligence, Fargate, internal) — serving + ONNX inference in-process (onnxruntime-node), TS only.
sst.aws.Task (Python image) — batch training on demand via task.run(), triggered by the orchestrator (not a calendar). Runs dbt + sklearn/lightgbm/skl2onnx. ~$0.02/hr.
sst.aws.Bucket — ONNX model artifacts; the registry rows point at the S3 key + version.
EventBridge subscriptions (intelligence-on-*) for the Outcome Join + erasure.
OPENAI secret (ai service); boot-asserts non-mock in staging/prod.

The intelligence service registers in sst.config.ts (as of SH-1) but is dark by default — stage flag S0, nothing in the product path calls /rank yet.

Train → register → serve (A-2)

The model lifecycle is split across the two runtimes (D2):

Train (Python, batch): services/intelligence/training/train.py runs as the intelligence-training Task — reads the feature store + dbt label tables (A-1), trains a baseline, exports ONNX, uploads the bytes to the artifacts bucket, and registers a shadow entry in the model/rule registry. It never auto-promotes.
Govern: governance (SH-2) promotes shadow → candidate → champion on real eval metrics.
Serve (TS): the model loader (onnxruntime-node, S3 bytes → cached session) + scorer align serving features to the training feature order stamped in the registry entry (eval_metrics.feature_names) — no train/serve skew. RegistryService.getChampionModel finds the artifact to load; if none is promoted, serving falls back to the rules ranker (advancing the stage never serves a worse rec).

Dark + gated: nothing trains until a feature store is materialized, and the rank hot path serves the model only once a champion is promoted at S3+.

Learning from the loop (the uplift gate)

The trainer has two modes (TRAIN_MODE):

observational (bootstrap) — fits a static dbt label (reordered_within_90d). A correlational predictor; used before the loop has data.
counterfactual (the gate) — reads intelligence.training_decisions (the loop’s OWN joined decisions + realized reward + selection propensity) and fits IPS-weighted (1/propensity, floor mirrored in TS off-policy.ts). The model now learns from its own choices, debiased for exploration — not from a static label.

This is the step that turns the system from “a predictor watching the loop” into “a policy that improves from its decisions.” The ladder above it (all on these same rails — only the trainer/policy swaps): counterfactual value → uplift / CATE (incremental effect per person; the clinical “did the intervention move the biomarker” estimator) → sequential decision-making / optimal dynamic treatment regimes (optimize the member’s whole trajectory, not the next outcome) → a mechanistic “digital twin” + active experimental design. Each level is gated on the one below — technically (it’s the generalization), for data (the lower level generates the trajectories), and for safety (off-policy eval validates a policy before it serves).

Tooling (per ADR-0093 / ADR-0094)

dbt-core (feature pipeline) · Python batch Task → ONNX (training/serving) · onnxruntime-node (TS inference) · TS Thompson sampler (exploration) · Evidently + PostHog (drift/A-B) · Postgres/Drizzle (ledger + registry). Deferred: MLflow, Vowpal Wabbit, EconML/DoWhy. Avoided: Feast, Kubeflow, Ray, Spark, Airflow.

Event lifecycle Deployment pipeline