ConceptsIdempotency & retries

Idempotency and retries

What this is: how we make every retry safe. Callers can re-send the same request and get the same outcome; consumers can re-process the same event without double-counting.

Who it’s for: anyone writing a POST/PUT/DELETE endpoint, anyone writing an event handler, anyone debugging “why did this customer get charged twice?”

What to read next: Events, Rate limits and circuit breakers, services/accounting.

The rule

If retrying any request produces a different outcome than the first attempt, that’s a bug.

This applies to:

  • HTTP POST/PUT/DELETE endpoints (caller might retry on timeout)
  • Event handlers (NATS + EventBridge may deliver twice)
  • Outbound vendor calls (Stripe, BigCommerce, Postmark — each can drop a connection)
  • Cron jobs (executor Lambda may be retried on Lambda-level failures)

The pattern: idempotency table

Each service has an idempotency table:

CREATE TABLE <service>.idempotency_keys (
  key            TEXT PRIMARY KEY,
  scope          TEXT NOT NULL,           -- 'route:POST_/v1/payments' or 'event:order.placed.v1'
  request_hash   TEXT NOT NULL,           -- SHA256 of canonical payload
  response       JSONB NOT NULL,
  expires_at     TIMESTAMPTZ NOT NULL,
  created_at     TIMESTAMPTZ NOT NULL DEFAULT now()
);

On a state-changing request:

  1. Caller provides Idempotency-Key header (any 1–128 char string).
  2. Service hashes the request body.
  3. Lookup (scope, key):
    • Hit, same hash → return stored response (idempotent replay).
    • Hit, different hash → 409 idempotency_key_payload_mismatch (caller misused the key).
    • Miss → execute, write the row, return.

The lookup + write is in the same transaction as the side effect. Keys expire after 24 hours by default; longer for high-stakes operations (payments: 7 days).

HTTP routes

Every POST/PUT/DELETE endpoint declares it requires an Idempotency-Key:

import { idempotent } from "@platform/hono";
 
app.openapi(createPaymentRoute, idempotent({ scope: "POST_/v1/payments" }, async (c) => {
  const body = c.req.valid("json");
  // ... business logic
  return c.json({ payment_id: paymentId }, 201);
}));

Without the header, the route returns 400 missing_idempotency_key for mutations. GETs don’t require it (idempotent by definition).

A cleanup job runs nightly to purge expired idempotency rows.

Event handlers

Event handlers dedupe by event ID + handler-specific key:

export const handler = createEventHandler({
  event: EVENT_NAMES.ORDER_PLACED_V1,
  async handle(payload, ctx) {
    const key = `commission:${payload.order_id}`;
    const existing = await ctx.idempotency.find(key, "event:order.placed.v1");
    if (existing) return;
 
    await ctx.db.transaction(async (tx) => {
      const commission = await ctx.affiliates.postCommission(payload, tx);
      await tx.insert(idempotency).values({
        key,
        scope: "event:order.placed.v1",
        request_hash: hash(payload),
        response: { commission_id: commission.id },
        expires_at: addDays(30),
      });
    });
  },
});

The handler-specific key matters. If two different handlers consume the same event for different side effects, each has its own key (commission:<order_id> vs analytics:<order_id>).

Outbound vendor calls

Vendor calls go through @platform/core’s circuit breaker + retry helper, which respects the vendor’s idempotency protocol:

  • Stripe — pass Idempotency-Key header; Stripe handles dedup.
  • Postmark — message ID is the dedup key; we generate it client-side.
  • BigCommerce — POST without a native idempotency primitive; we wrap calls with a local idempotency key that prevents double-submit on retry.

Retry policy (also in @platform/core):

  • Exponential backoff: 500ms, 2s, 5s, 30s, then dead-letter.
  • Network errors retry; 4xx (except 429) does not retry.
  • 429 honors Retry-After.
  • 5xx retries up to the cap.

Cron jobs

Each cron job declares its idempotency scope in services/jobs/src/jobs/registry.ts:

{
  id: "release-commission-locks",
  schedule: "cron(0 * * * ? *)",
  targetUrlEnv: "MEMBERSHIP_URL",
  path: "/v1/admin/commission-locks/release-expired",
  scopes: ["membership:admin"],
}

The executor Lambda includes a run-id in its admin call, and the target service uses run-id as the idempotency key. If the Lambda retries (Lambda-level failure), the second attempt with the same run-id is a no-op.

What this guards against

  • Caller retries on timeout — gets the same response, no double-write.
  • NATS + EventBridge double-delivery — handler dedupes by event ID + scope.
  • Outbound vendor flakiness — retry honors vendor idempotency.
  • Lambda execution retries — cron jobs are safe.
  • Connection drops mid-write — the transaction either commits cleanly or rolls back; the outbox + idempotency rows are in the same transaction so we can’t get “half done.”

What it doesn’t guard against

  • A different caller submitting the same logical operation — two different requests with two different idempotency keys are NOT deduped. Business-logic-level uniqueness constraints (unique constraint on external_order_id, e.g.) cover that.
  • Replays beyond the expiry window — after 24h (or 7d for payments), the key expires and a replay would execute. That’s why expiry windows are set per scope.
  • Cross-service deduplication — if service A retries a call to service B, B dedupes locally; if A also publishes an event downstream of that call, downstream consumers dedupe via their own keys. No global dedup table.

Common mistakes

  • Forgetting the Idempotency-Key on mutations. The idempotent() middleware fails the request, but make sure tests assert this.
  • Reusing a key for a different operation. The hash check catches this with a 409, but it’s a sign the caller’s key-generation logic is wrong.
  • Wrapping the side effect outside the transaction. Idempotency lookup + side effect + idempotency write MUST be in one transaction.
  • Not setting Retry-After on 429. Callers need it to back off correctly.

Source ADRs

ADR-0030 (idempotency keys), ADR-0040 (outbox + at-least-once events), ADR-0044 (vendor retry policy).