Durable workflows
What this is: how the platform orchestrates multi-step business processes that span minutes, hours, or days — onboarding sequences, Certificate of Analysis (COA) validation, fulfillment retries, win-back ladders, General Data Protection Regulation (GDPR) erasure fan-out. The workflow engine owns persistence, timers, retries, replay, and the operator UI; service code stays stateless.
Who it’s for: anyone writing a multi-step flow that needs to survive a deploy, retry intelligently, sleep for hours, or fan out across services. For a single API call, use a regular service route and an idempotency key instead.
What to read next: Events, Idempotency and retries, Reliability and deployment.
Source ADR: 0056 — Durable workflow engine.
The choice: Inngest, not Temporal, not in-process
LOO-1925 evaluated three options for durable orchestration:
- Port
@loop/workflow-enginefrom the legacy repo. A rules interpreter, not a durable engine — no persistence, no timers, no retries, no operator visibility. Porting requires rebuilding every hard part. - Temporal Cloud. Highest technical ceiling. Highest adoption cost: SDK shape, worker model, separate operational discipline.
- Inngest (managed). HTTP-first step functions, free tier covers early volume, retry, replay, and operator UI built in.
The platform adopted Inngest (Architecture Decision Record 0056). It is the smallest step up from “no engine at all” that still provides durable guarantees, and it composes cleanly with the HTTP-only service model — workflows are HTTP handlers that the Inngest runner invokes.
How a workflow composes
┌──────────────────────────────────────────────────────────┐
│ services/<owner> — the service that owns the flow │
│ │
│ src/workflows/onboarding.workflow.ts │
│ inngest.createFunction( │
│ { id: "onboarding" }, │
│ { event: "identity.user.created.v1" }, │
│ async ({ event, step }) => { │
│ await step.run("provision-brand", () => ...) │
│ await step.sleep("wait-day-2", "2d") │
│ await step.run("send-day-2-email", () => ...) │
│ }, │
│ ) │
└──────────────────────┬───────────────────────────────────┘
│ HTTP step calls
┌──────────────────────▼───────────────────────────────────┐
│ Inngest (managed) — owns state, timers, retries │
└──────────────────────────────────────────────────────────┘Each step.run call is replay-safe: Inngest persists its result and skips the step on retry. Each step.sleep call survives deploys. When the timer fires, Inngest calls the same HTTP handler back at the correct step.
What belongs in a workflow vs. a route
| Use a workflow | Use a regular service route |
|---|---|
| Sleeps, waits, or human approval gates | Synchronous request → response |
| Multi-step fan-out across services | Single bounded-context operation |
| Needs to survive a deploy / restart | Completes in one request |
| Hour-plus retry windows | Sub-second / sub-minute retry |
| Operator visibility on per-step failures | Per-request logs + audit are enough |
If the code reaches for setTimeout, hand-rolls a state column, or stashes “next step” rows in a database for a cron to pick up later, the right primitive is a workflow.
Event handlers versus workflows
Both subscribe to EventBridge or NATS. They differ in unit of work:
- Event handler — one event in, side effects out, idempotent. Lives in the service. On failure, the bus retries the entire handler.
- Workflow — one event triggers a sequence of steps with shared state. Lives in Inngest. On failure, Inngest retries only the failed step.
A handler that grows multiple steps with persisted intermediate state is a workflow in disguise; extract it.
Common mistakes
- Hand-rolled retry loop inside a workflow body. Configure step retries through Inngest configuration instead.
- Reading mutable database state inside a
step.runand assuming determinism. Capture the value once and pass it forward; replays may otherwise hit a different row. - Placing protected health information (PHI) in event payloads to share state between steps. Use step inputs and outputs for opaque ids, then pull PHI inside the step that needs it and audit the read.
- Single 30-step monolith. Smaller, composable functions chained via events are easier to reason about than one giant workflow.
See also
- ADR: 0056 — Durable workflow engine — adopt Inngest
- Related concepts: Events, Idempotency and retries, Reliability and deployment
- Extraction brief:
docs/architecture/extractions/workflow-engine.md - Modular plan: LOO-1925 — Durable workflow engine