getting-startedBuild a background job

Build a background job

What this is: how to ship a recurring background job that runs on the platform.

Who it’s for: anyone writing a cron task, batch processor, periodic reconciler, or async workflow.

Time: 30 minutes for a job that hits an existing admin endpoint; longer if the target endpoint doesn’t exist yet.

Two patterns

Pattern A: Job calls an admin endpoint on a service. Recommended. The service owns the logic; the job just invokes it on a schedule.

Pattern B: Job runs its own logic. Use only when the work is purely about scheduling (e.g., “every 6h, ping each integration’s health”). Avoid duplicating service logic.

We prefer A. The job is a thin scheduler; services keep all the domain knowledge.

The job runtime

services/jobs is a Fargate service that:

  • Maintains a registry of JobDefinition entries in src/jobs/registry.ts
  • Renders each entry as an SST Cron in infra.ts
  • The Cron’s Lambda calls a generic executor (src/jobs/executor.ts) that:
    1. Acquires an M2M access token from services/identity
    2. Invokes the target endpoint with that token
    3. Logs the result + metrics
    4. Retries on transient failures (handled by the target service’s idempotency)

Step 1: Decide the target

Where should the logic live? Almost always: in the service that owns the domain. If the job is “release expired commission locks,” the logic lives in services/membership. The job’s only job is the schedule.

If the target endpoint doesn’t exist yet, add it to the target service first:

// services/membership/src/routes/admin/commission-locks.routes.ts
app.use("/v1/admin/commission-locks/*", requireScope(SCOPES.ADMIN_MEMBERSHIP));
 
app.openapi(
  createRoute({ method: "post", path: "/v1/admin/commission-locks/release-expired", ... }),
  async (c) => {
    const result = await commissionLockService.releaseExpired();
    return c.json({ released: result.count }, 200);
  },
);

This endpoint MUST be idempotent. Run it twice in a row — same outcome. See Idempotency and retries.

Step 2: Add the job definition

services/jobs/src/jobs/registry.ts:

export const jobs: JobDefinition[] = [
  // ... existing entries
  {
    id: "release-commission-locks",
    name: "Release Expired Commission Locks",
    description: "Hourly job that releases expired 4-day membership commission locks.",
    schedule: "cron(0 * * * ? *)",          // EventBridge cron syntax — hourly
    targetUrlEnv: "MEMBERSHIP_URL",
    method: "POST",
    path: "/v1/admin/commission-locks/release-expired",
    audience: "membership",
    scopes: ["admin:membership"],
  },
];

That’s it. The cron entry in services/jobs/infra.ts is auto-generated from the registry, so no infra edits required.

Step 3: Add the target URL secret

services/jobs/infra.ts — add an SST Secret if your target service URL isn’t already in the map:

const targetUrlSecrets: Record<string, sst.Secret> = {
  ACCOUNTING_URL: new sst.Secret("JOBS_ACCOUNTING_URL", ""),
  MEMBERSHIP_URL: new sst.Secret("JOBS_MEMBERSHIP_URL", ""),
  // ... add new URL secrets here
};

Then set the value per stage:

pnpm sst secret set JOBS_MEMBERSHIP_URL https://membership.dev.platform.loop.health --stage dev
pnpm sst secret set JOBS_MEMBERSHIP_URL https://membership.staging.platform.loop.health --stage staging
# prod is set manually via the deploy gate

Step 4: Verify M2M auth

The job uses M2M credentials from env: JOBS_M2M_CLIENT_ID + JOBS_M2M_CLIENT_SECRET. These point to a single M2M client registered with admin:* scopes for every service.

If the audience or scopes differ from the registry default (rare):

{
  id: "your-job",
  // ...
  audience: "clinical",            // overrides M2M_AUDIENCE env default
  scopes: ["admin:clinical"],      // overrides M2M_SCOPES env default
}

Step 5: Test locally

# Set env vars to point at dev
export IDENTITY_URL=https://identity.dev.platform.loop.health
export M2M_CLIENT_ID=...
export M2M_CLIENT_SECRET=...
export MEMBERSHIP_URL=https://membership.dev.platform.loop.health
export JOB_ID=release-commission-locks
 
# Run the executor manually
pnpm --filter @services/jobs exec tsx src/jobs/executor.ts

Should authenticate, call the target, log the result.

Step 6: Add the registry test entry

services/jobs/tests/registry.test.ts — assertion that every registry entry has valid shape. New entries are auto-validated.

Step 7: Add the runbook section

services/jobs/RUNBOOK.md — add a section for your job:

  • What it does (1 sentence)
  • Schedule
  • Target service + endpoint
  • Failure mode (what happens if it fails)
  • Remediation (what an on-call should do)

What the platform does for you

  • Authentication — handled by the executor; you never touch tokens.
  • Retries — Lambda retries on infrastructure failures; target service’s idempotency makes them safe.
  • Logging — structured logs with job_id, run_id, duration_ms, target_status.
  • Metrics — OTel histograms for duration, success/failure counts.
  • Alarms — auto-wired CloudWatch alarms on consecutive failures (per service alarms config).
  • Dead-letter — failed runs go to a DLQ for inspection.

What you should not do

  • Don’t write business logic in the executor. The executor is a generic HTTP client. Logic lives in the target service.
  • Don’t put secrets in the registry. Use SST Secrets.
  • Don’t make non-idempotent target endpoints. Lambda retries can re-invoke.
  • Don’t schedule conflicting jobs (two cron entries that race for the same rows). If you need exclusivity, use distributed locks via @platform/core.