Full checklist
A single page that tells you exactly what “ready to ship” means. Run through this before any production deploy.
For a new platform service
Code
- Service follows
services/_template/shape (ADR-0037) -
service.yamlcomplete: name, owner, on-call, status, brand_scope, contracts -
openapi.yamlexists with description + example on every route - Every route enforces a scope via
requireScope(...)(the convention check fails CI otherwise) - Every mutation writes an audit row via
c.set("audit_event_type", ...) - Every table has a
brand_idcolumn - No raw
fetch()to*.platform.loop.health— use SDKs - No PHI in logs — wrap with
safeView() - Idempotency keys on mutations
- Vendor calls wrapped in circuit breakers
Database
- Migration is expand-only or follows expand–contract
- Tested against a populated dataset (not just empty dev)
- No long-running locks during peak hours
-
brand_idcolumn required + indexed where read paths use it
Events
- Published events registered in
EVENT_SCHEMAS - Subscribed events have handler files in
src/events/ - Producer writes to outbox in same tx as state change
- Consumer dedupes by event ID + handler scope
Tests
- Unit tests for service classes
- Integration tests for routes (real DB, real event publish)
- Conformance tests pass (audit, brand scoping, scope enforcement)
- Test coverage realistic — not 100%, but every error path has a case
Docs
-
README.mdexists with one-paragraph description + run-locally instructions -
RUNBOOK.mdexists with alarms, dashboards, remediation - Service detail page auto-generated and committed
- Every route description + example populated in
openapi.yaml - Service catalog entry shows correct status, owner, on-call
Observability
- OTel instrumented (logs, traces, metrics)
- OTel
service.nameset in resource attributes - Sentry DSN wired per stage
- CloudWatch alarms defined for: error rate, latency p99, dead-letter queue depth
- PagerDuty routing configured (critical → page, warn → Slack)
Auth
- M2M client registered in identity if other services call this
- Scopes added to
@platform/scopesif new ones introduced - BAA gate enforced if any new scope returns PHI
- Consent screen text reviewed for new scopes
Deployment
-
infra.tscomplete: hostname, scaling, health check, alarms - Service registered in
sst.config.ts - SST Secrets set for dev / staging / prod
- Cloudflare DNS configured (
<service>.platform.loop.healthand stage variants) - Deploy to dev succeeds, healthz returns 200
- Smoke test against dev passes
- Deploy to staging succeeds
- Soak in staging for at least 24 hours
- Prod gate approved by reviewer
SDK
-
pnpm sdk:genruns cleanly -
@platform/sdk-<service>package built + published - SDK reference page generated
- At least one consumer (canary, internal app) imports + calls successfully
For a new third-party-facing endpoint
- Scope chosen and documented
- BAA gate if PHI
- OAuth flow + token introspection tested end-to-end against this endpoint
- Rate limit appropriate for this endpoint’s cost profile
- Webhook(s) emitted if this is a meaningful state change
- Connect docs updated with the new capability
- Per-endpoint API reference page regenerated
For a new event
- Schema added to
packages/contracts/src/events/<family>.tsas a Zod schema - Registered in
EVENT_SCHEMAS -
EVENT_NAMESregenerated (pnpm --filter @platform/contracts gen) - Publisher’s
service.yamlevents_publishedupdated - At least one test asserts the schema accepts the producer’s payload shape
- Event reference page auto-generated
- If user-facing: scope it appropriately + add to webhook event filters
For a new background job
- Target service has an idempotent admin endpoint
-
JobDefinitionadded toservices/jobs/src/jobs/registry.ts - Target URL SST Secret set per stage
- Job logged + alarmed for repeated failures
- Runbook entry added to
services/jobs/RUNBOOK.md - At least one cycle executed in dev and verified
For a new partner integration (third-party app)
- OAuth client registered in developer portal
- Redirect URIs configured per environment
- PKCE implemented (S256, never
plain) - Token refresh implemented and tested
- Webhook signature verification implemented
- Error handling covers all platform error codes
- Rate-limit backoff with jitter
- Logging excludes tokens + PHI
- User has revoke / disconnect UI
- BAA signed if PHI scopes
- Branding follows the Loop button kit
For any production deploy
- Changeset committed with user-facing summary (not “fix” or “wip”)
- CI green (all 23+ convention checks, typecheck, tests, build, drift checks)
- At least one human reviewer approved
- No merge during a freeze window
- Rollback procedure tested OR documented
- Alarms acknowledged for the deploy window