BuildProduction Readiness

Production-Readiness Checklist

Every service must satisfy this checklist before being promoted to status: ga in its service.yaml. Use this as a gate for launch reviews — if an item is unchecked, the service is not production-ready.

⚠️

This checklist is enforced by convention checks in CI. Running pnpm check:conventions surfaces most violations automatically.

Required

All items in this section are mandatory. A service cannot ship to production without them.

Service contract

  • service.yaml fully populated — all sections including owner, dependencies, events_published, events_consumed, and sla
  • SLA defined in service.yaml with target availability and latency percentiles
  • All events_published have matching schemas in @platform/contracts
  • All events_consumed have corresponding handler files in the service

Documentation

  • README.md exists with local dev instructions (see Onboarding for what to include)
  • RUNBOOK.md exists with service-specific operational content
  • On-call runbook populated at docs/runbooks/oncall.md
  • Rollback runbook populated at docs/runbooks/rollback.md

API surface

  • openapi.yaml generated and committed — run pnpm openapi:gen and verify no drift
  • Health endpoints respond: GET /healthz (liveness) and GET /readyz (readiness)
  • OAuth scopes enforced on all routes via requireScope

Data and audit

  • Audit log table created and wired via auditTable in createService()
  • At least one migration committed in services/<name>/migrations/

CI and deploy

  • All convention checks pass: pnpm check:conventions
  • Deployed to dev and staging stages

These items are strongly encouraged but not blocking for ga status. Address them as soon as practical after launch.

Testing and reliability

  • Integration tests covering critical paths (happy path + primary error cases)
  • Load tested to SLA targets (use the team’s k6 scripts or equivalent)

Observability

  • Dashboard URL added to service.yaml under observability.dashboard
  • Alerting configured in Axiom for error rate, latency, and availability thresholds

Developer experience

  • SDK published as @platform/sdk-<name> (if the service has external consumers)
  • Docs page generated and reviewed on the docs site

Badges

The service catalog displays a prod-ready badge next to services that meet the production bar. A service earns the badge when all of the following are true:

CriterionHow it’s checked
status is ga in service.yamlCI reads the YAML field
RUNBOOK.md existsFile-existence check
README.md existsFile-existence check
openapi.yaml existsFile-existence check
All convention checks passpnpm check:conventions exits 0

The badge updates automatically on every merge to main. If a previously passing service breaks a convention check, the badge is removed until the violation is fixed.

Services without the badge are considered alpha or beta and are not eligible for production traffic from external consumers.


Walkthrough

Use this sequence to work through the checklist for a new service:

  1. Populate service.yaml — start from the template in services/_template/service.yaml.
  2. Wire audit logging — follow the pattern in Audit & PHI.
  3. Add health endpoints — GET /healthz returns { "status": "ok" } and GET /readyz checks downstream dependencies.
  4. Generate openapi.yaml — run pnpm openapi:gen after adding all routes.
  5. Write RUNBOOK.md — cover common failure modes, restart procedures, and escalation contacts.
  6. Run pnpm check:conventions — fix all violations.
  7. Deploy to dev and staging — verify the service starts and health checks pass.
  8. Request a launch review from the platform team.

Next Steps