ConceptsSystem overview

System overview

What this is: the entire Loop Platform on one page — services, data plane, event bus, identity, and how a request flows through.

Who it’s for: new engineers on day 1, senior reviewers auditing the architecture, anyone asking “where does X live?”

What to read next: Events, Auth model, Brands and multi-tenancy.

The platform at a glance

The four planes

The platform has four conceptual planes. Knowing which plane you’re in tells you which patterns apply.

Edge plane

Cloudflare resolves *.platform.loop.health to a single AWS Application Load Balancer per stage. The ALB routes by subdomain to one of the per-service ECS Fargate services. TLS terminates at the ALB. No service is publicly reachable except through this path.

Identity plane

services/identity is the only service that talks to WorkOS and Clerk. Every other service treats it as the source of truth for “who is this caller?”. Two grant types:

  • authorization_code + PKCE — user-facing apps (first-party and third-party). See Auth model.
  • client_credentials — service-to-service (cron jobs, internal workflows). M2M tokens are admin-scoped and never carry user-facing scopes.

Token validation is opaque: services call /v1/oauth/introspect (cached briefly in Redis) instead of decoding a JWT. This lets us revoke instantly without waiting for JWT expiry.

Service plane

27 domain services, each running as one or more Fargate tasks. Per-service rules that are enforced by scripts/check-conventions.ts:

  • Owns its own Postgres schema. No cross-schema reads or writes.
  • Defines its HTTP API in openapi.yaml (drift-checked in CI).
  • Declares published + consumed events in service.yaml (cross-checked against EVENT_SCHEMAS).
  • Writes audit rows for every mutation.
  • Enforces OAuth scopes on every route via requireScope(...).

When one service needs data from another, it calls the other’s HTTP API with an M2M token — never reaches into the other’s database.

Data plane

  • Aurora Postgres is the system of record. One cluster, one database, schemas isolated per service.
  • Redis holds short-lived state: token introspection cache, distributed locks, rate-limit counters.
  • S3 stores artifacts: CMS media, openapi snapshots, audit-log exports.

Event plane

Asynchronous communication uses a dual-bus pattern:

  • NATS for in-cluster fan-out (sub-second latency, no durability beyond the cluster).
  • EventBridge for durable cross-service contracts and replay.

Producers write events to a per-service outbox table in the same transaction as their state change. A drainer Lambda forwards outbox rows to both buses. Subscribers receive via NATS (fast path) and reconcile via EventBridge (durable path). Details: Events.

How a request flows

The typical request through the platform: a third-party app called “PartnerGym” wants to read a user’s biomarkers.

Five mandatory checkpoints in every request:

  1. Token introspection — proves the caller is who they claim to be.
  2. Scope check — proves the caller is allowed to perform this action.
  3. Brand scoping — every row read includes a brand_id filter so a user from brand A can never see brand B data.
  4. PHI safe-view — sensitive fields go through redaction before they hit logs.
  5. Audit log — every state change writes an audit row in the same transaction.

If any of these is missing, the convention check rejects the PR at CI.

What the platform does NOT do

  • No service reads or writes another service’s database. All cross-service data flows through HTTP or events.
  • No JWTs for access tokens. Opaque tokens (lph_at_*) so we can revoke instantly.
  • No raw vendor calls from random services. Stripe / BigCommerce / Twilio / Postmark each have one canonical service that owns the integration. Other services consume that service via SDK.
  • No public reachability except through the ALB. Services aren’t on public subnets.
  • No shared code via copy-paste. Shared concerns live in @platform/* packages.

Deployment topology

Each stage has its own Aurora cluster, Redis instance, secrets, and observability sink. Deploys are driven by SST. Per-PR preview environments stand up only the services touched by the PR.

Where things live

You’re looking forIt lives in
The contract a service exposesservices/<svc>/openapi.yaml
The schema a service publishespackages/contracts/src/events/<family>.ts
The data modelservices/<svc>/src/db/schema.ts
How a service is deployedservices/<svc>/infra.ts
What alarms fire whenservices/<svc>/RUNBOOK.md
Why a decision was madedocs/decisions/ (ADRs — appendix)
How to use a service from an appBuild section + SDK reference

Source ADRs

If you want the historical “why” behind these decisions: ADR-0027 (monorepo), ADR-0028 (SST + AWS), ADR-0029 (stages), ADR-0036 (M2M), ADR-0037 (service shape), ADR-0038 (brands), ADR-0039 (audit logs), ADR-0040 (EventBridge bus), ADR-0046 (PHI safe-views), ADR-0052 (Connect / OAuth).