getting-startedFull checklist

Full checklist

A single page that tells you exactly what “ready to ship” means. Run through this before any production deploy.

For a new platform service

Code

  • Service follows services/_template/ shape (ADR-0037)
  • service.yaml complete: name, owner, on-call, status, brand_scope, contracts
  • openapi.yaml exists with description + example on every route
  • Every route enforces a scope via requireScope(...) (the convention check fails CI otherwise)
  • Every mutation writes an audit row via c.set("audit_event_type", ...)
  • Every table has a brand_id column
  • No raw fetch() to *.platform.loop.health — use SDKs
  • No PHI in logs — wrap with safeView()
  • Idempotency keys on mutations
  • Vendor calls wrapped in circuit breakers

Database

  • Migration is expand-only or follows expand–contract
  • Tested against a populated dataset (not just empty dev)
  • No long-running locks during peak hours
  • brand_id column required + indexed where read paths use it

Events

  • Published events registered in EVENT_SCHEMAS
  • Subscribed events have handler files in src/events/
  • Producer writes to outbox in same tx as state change
  • Consumer dedupes by event ID + handler scope

Tests

  • Unit tests for service classes
  • Integration tests for routes (real DB, real event publish)
  • Conformance tests pass (audit, brand scoping, scope enforcement)
  • Test coverage realistic — not 100%, but every error path has a case

Docs

  • README.md exists with one-paragraph description + run-locally instructions
  • RUNBOOK.md exists with alarms, dashboards, remediation
  • Service detail page auto-generated and committed
  • Every route description + example populated in openapi.yaml
  • Service catalog entry shows correct status, owner, on-call

Observability

  • OTel instrumented (logs, traces, metrics)
  • OTel service.name set in resource attributes
  • Sentry DSN wired per stage
  • CloudWatch alarms defined for: error rate, latency p99, dead-letter queue depth
  • PagerDuty routing configured (critical → page, warn → Slack)

Auth

  • M2M client registered in identity if other services call this
  • Scopes added to @platform/scopes if new ones introduced
  • BAA gate enforced if any new scope returns PHI
  • Consent screen text reviewed for new scopes

Deployment

  • infra.ts complete: hostname, scaling, health check, alarms
  • Service registered in sst.config.ts
  • SST Secrets set for dev / staging / prod
  • Cloudflare DNS configured (<service>.platform.loop.health and stage variants)
  • Deploy to dev succeeds, healthz returns 200
  • Smoke test against dev passes
  • Deploy to staging succeeds
  • Soak in staging for at least 24 hours
  • Prod gate approved by reviewer

SDK

  • pnpm sdk:gen runs cleanly
  • @platform/sdk-<service> package built + published
  • SDK reference page generated
  • At least one consumer (canary, internal app) imports + calls successfully

For a new third-party-facing endpoint

  • Scope chosen and documented
  • BAA gate if PHI
  • OAuth flow + token introspection tested end-to-end against this endpoint
  • Rate limit appropriate for this endpoint’s cost profile
  • Webhook(s) emitted if this is a meaningful state change
  • Connect docs updated with the new capability
  • Per-endpoint API reference page regenerated

For a new event

  • Schema added to packages/contracts/src/events/<family>.ts as a Zod schema
  • Registered in EVENT_SCHEMAS
  • EVENT_NAMES regenerated (pnpm --filter @platform/contracts gen)
  • Publisher’s service.yaml events_published updated
  • At least one test asserts the schema accepts the producer’s payload shape
  • Event reference page auto-generated
  • If user-facing: scope it appropriately + add to webhook event filters

For a new background job

  • Target service has an idempotent admin endpoint
  • JobDefinition added to services/jobs/src/jobs/registry.ts
  • Target URL SST Secret set per stage
  • Job logged + alarmed for repeated failures
  • Runbook entry added to services/jobs/RUNBOOK.md
  • At least one cycle executed in dev and verified

For a new partner integration (third-party app)

  • OAuth client registered in developer portal
  • Redirect URIs configured per environment
  • PKCE implemented (S256, never plain)
  • Token refresh implemented and tested
  • Webhook signature verification implemented
  • Error handling covers all platform error codes
  • Rate-limit backoff with jitter
  • Logging excludes tokens + PHI
  • User has revoke / disconnect UI
  • BAA signed if PHI scopes
  • Branding follows the Loop button kit

For any production deploy

  • Changeset committed with user-facing summary (not “fix” or “wip”)
  • CI green (all 23+ convention checks, typecheck, tests, build, drift checks)
  • At least one human reviewer approved
  • No merge during a freeze window
  • Rollback procedure tested OR documented
  • Alarms acknowledged for the deploy window