System overview
What this is: the entire Loop Platform on one page — services, data plane, event bus, identity, and how a request flows through.
Who it’s for: new engineers on day 1, senior reviewers auditing the architecture, anyone asking “where does X live?”
What to read next: Events, Auth model, Brands and multi-tenancy.
The platform at a glance
The four planes
The platform has four conceptual planes. Knowing which plane you’re in tells you which patterns apply.
Edge plane
Cloudflare resolves *.platform.loop.health to a single AWS Application Load Balancer per stage. The ALB routes by subdomain to one of the per-service ECS Fargate services. TLS terminates at the ALB. No service is publicly reachable except through this path.
Identity plane
services/identity is the only service that talks to WorkOS and Clerk. Every other service treats it as the source of truth for “who is this caller?”. Two grant types:
authorization_code+ PKCE — user-facing apps (first-party and third-party). See Auth model.client_credentials— service-to-service (cron jobs, internal workflows). M2M tokens are admin-scoped and never carry user-facing scopes.
Token validation is opaque: services call /v1/oauth/introspect (cached briefly in Redis) instead of decoding a JWT. This lets us revoke instantly without waiting for JWT expiry.
Service plane
27 domain services, each running as one or more Fargate tasks. Per-service rules that are enforced by scripts/check-conventions.ts:
- Owns its own Postgres schema. No cross-schema reads or writes.
- Defines its HTTP API in
openapi.yaml(drift-checked in CI). - Declares published + consumed events in
service.yaml(cross-checked againstEVENT_SCHEMAS). - Writes audit rows for every mutation.
- Enforces OAuth scopes on every route via
requireScope(...).
When one service needs data from another, it calls the other’s HTTP API with an M2M token — never reaches into the other’s database.
Data plane
- Aurora Postgres is the system of record. One cluster, one database, schemas isolated per service.
- Redis holds short-lived state: token introspection cache, distributed locks, rate-limit counters.
- S3 stores artifacts: CMS media, openapi snapshots, audit-log exports.
Event plane
Asynchronous communication uses a dual-bus pattern:
- NATS for in-cluster fan-out (sub-second latency, no durability beyond the cluster).
- EventBridge for durable cross-service contracts and replay.
Producers write events to a per-service outbox table in the same transaction as their state change. A drainer Lambda forwards outbox rows to both buses. Subscribers receive via NATS (fast path) and reconcile via EventBridge (durable path). Details: Events.
How a request flows
The typical request through the platform: a third-party app called “PartnerGym” wants to read a user’s biomarkers.
Five mandatory checkpoints in every request:
- Token introspection — proves the caller is who they claim to be.
- Scope check — proves the caller is allowed to perform this action.
- Brand scoping — every row read includes a
brand_idfilter so a user from brand A can never see brand B data. - PHI safe-view — sensitive fields go through redaction before they hit logs.
- Audit log — every state change writes an audit row in the same transaction.
If any of these is missing, the convention check rejects the PR at CI.
What the platform does NOT do
- No service reads or writes another service’s database. All cross-service data flows through HTTP or events.
- No JWTs for access tokens. Opaque tokens (
lph_at_*) so we can revoke instantly. - No raw vendor calls from random services. Stripe / BigCommerce / Twilio / Postmark each have one canonical service that owns the integration. Other services consume that service via SDK.
- No public reachability except through the ALB. Services aren’t on public subnets.
- No shared code via copy-paste. Shared concerns live in
@platform/*packages.
Deployment topology
Each stage has its own Aurora cluster, Redis instance, secrets, and observability sink. Deploys are driven by SST. Per-PR preview environments stand up only the services touched by the PR.
Where things live
| You’re looking for | It lives in |
|---|---|
| The contract a service exposes | services/<svc>/openapi.yaml |
| The schema a service publishes | packages/contracts/src/events/<family>.ts |
| The data model | services/<svc>/src/db/schema.ts |
| How a service is deployed | services/<svc>/infra.ts |
| What alarms fire when | services/<svc>/RUNBOOK.md |
| Why a decision was made | docs/decisions/ (ADRs — appendix) |
| How to use a service from an app | Build section + SDK reference |
Related
- Events — dual-bus pattern, outbox, replay
- Auth model — how OAuth, M2M, Clerk, WorkOS compose
- Brands and multi-tenancy —
brand_ideverywhere - Audit and PHI — what’s loggable, what isn’t
- Reliability and deployment — SLOs, alarms, rollback
- Service catalog — every service, owner, status
Source ADRs
If you want the historical “why” behind these decisions: ADR-0027 (monorepo), ADR-0028 (SST + AWS), ADR-0029 (stages), ADR-0036 (M2M), ADR-0037 (service shape), ADR-0038 (brands), ADR-0039 (audit logs), ADR-0040 (EventBridge bus), ADR-0046 (PHI safe-views), ADR-0052 (Connect / OAuth).