Production-Readiness Checklist
Every service must satisfy this checklist before being promoted to status: ga in its service.yaml. Use this as a gate for launch reviews — if an item is unchecked, the service is not production-ready.
This checklist is enforced by convention checks in CI. Running pnpm check:conventions surfaces most violations automatically.
Required
All items in this section are mandatory. A service cannot ship to production without them.
Service contract
-
service.yamlfully populated — all sections includingowner,dependencies,events_published,events_consumed, andsla - SLA defined in
service.yamlwith target availability and latency percentiles - All
events_publishedhave matching schemas in@platform/contracts - All
events_consumedhave corresponding handler files in the service
Documentation
-
README.mdexists with local dev instructions (see Onboarding for what to include) -
RUNBOOK.mdexists with service-specific operational content - On-call runbook populated at
docs/runbooks/oncall.md - Rollback runbook populated at
docs/runbooks/rollback.md
API surface
-
openapi.yamlgenerated and committed — runpnpm openapi:genand verify no drift - Health endpoints respond:
GET /healthz(liveness) andGET /readyz(readiness) - OAuth scopes enforced on all routes via
requireScope
Data and audit
- Audit log table created and wired via
auditTableincreateService() - At least one migration committed in
services/<name>/migrations/
CI and deploy
- All convention checks pass:
pnpm check:conventions - Deployed to
devandstagingstages
Recommended
These items are strongly encouraged but not blocking for ga status. Address them as soon as practical after launch.
Testing and reliability
- Integration tests covering critical paths (happy path + primary error cases)
- Load tested to SLA targets (use the team’s k6 scripts or equivalent)
Observability
- Dashboard URL added to
service.yamlunderobservability.dashboard - Alerting configured in Axiom for error rate, latency, and availability thresholds
Developer experience
- SDK published as
@platform/sdk-<name>(if the service has external consumers) - Docs page generated and reviewed on the docs site
Badges
The service catalog displays a prod-ready badge next to services that meet the production bar. A service earns the badge when all of the following are true:
| Criterion | How it’s checked |
|---|---|
status is ga in service.yaml | CI reads the YAML field |
RUNBOOK.md exists | File-existence check |
README.md exists | File-existence check |
openapi.yaml exists | File-existence check |
| All convention checks pass | pnpm check:conventions exits 0 |
The badge updates automatically on every merge to main. If a previously
passing service breaks a convention check, the badge is removed until the
violation is fixed.
Services without the badge are considered alpha or beta and are not eligible for production traffic from external consumers.
Walkthrough
Use this sequence to work through the checklist for a new service:
- Populate
service.yaml— start from the template inservices/_template/service.yaml. - Wire audit logging — follow the pattern in Audit & PHI.
- Add health endpoints —
GET /healthzreturns{ "status": "ok" }andGET /readyzchecks downstream dependencies. - Generate
openapi.yaml— runpnpm openapi:genafter adding all routes. - Write
RUNBOOK.md— cover common failure modes, restart procedures, and escalation contacts. - Run
pnpm check:conventions— fix all violations. - Deploy to
devandstaging— verify the service starts and health checks pass. - Request a launch review from the platform team.
Next Steps
- Onboarding — Ramp-up path for new engineers.
- Style Guide — Documentation standards for your service’s docs.
- Audit & PHI — How audit logging works and why it’s non-negotiable.
- Reliability & Deployment — Deploy stages and rollback procedures.
- Events & EventBridge — Event schema contracts.