ConceptsEvents

Events

What this is: how services communicate asynchronously without coupling. State changes get published as events; other services subscribe.

Who it’s for: anyone writing a service that emits an event, anyone debugging why a downstream didn’t react to one, anyone designing a new cross-service workflow.

What to read next: Event reference, Idempotency and retries, Webhooks.

Why dual-bus

The platform uses two event buses, by design.

NATS handles fast in-cluster fan-out. Sub-second latency, minimal overhead, no durability beyond the cluster. Good for “tell every running consumer right now.”

EventBridge handles durable cross-service contracts. Higher latency (1–5s), but events persist, can be replayed, and external webhook delivery hangs off it. Good for “this is part of the platform’s published contract.”

Producers don’t choose between the two — they publish once via the outbox, and the drainer fans out to both.

The outbox pattern

The fundamental problem: you can’t atomically write to your database AND publish to an external bus. Either you write to the DB first (and the publish fails, leaving an unannounced state change) or you publish first (and the DB write fails, leaving a phantom event).

The fix: write the event row to a local outbox table in the same transaction as the state change. A separate process drains the outbox and publishes to the buses.

// services/membership/src/services/membership.service.ts
await db.transaction(async (tx) => {
  await tx.insert(memberships).values({ userId, tier: "gold" });
  await tx.insert(outbox).values({
    eventName: EVENT_NAMES.MEMBERSHIP_TIER_CHANGED_V1,
    payload: { user_id: userId, tier: "gold", previous_tier: "silver" },
  });
});

If the transaction fails, both rows roll back. If it commits, the outbox row is durable and the drainer will eventually publish it. Exactly-once on the producer side; at-least-once at the consumer.

The drainer:

  • Polls the outbox table every second
  • Publishes each row to NATS + EventBridge
  • Marks the row as published (or retries on failure)
  • Garbage-collects published rows after 7 days

Event envelope

Every event has a versioned name and a typed payload. Naming: <domain>.<entity>.<verb>.<version>. Always versioned (.v1), never reused.

import { EVENT_NAMES, validateEvent } from "@platform/contracts";
 
await publishEvent(EVENT_NAMES.CLINICAL_BIOMARKER_PARSED_V1, {
  patient_id: "pat_01HXY...",
  biomarker: "testosterone",
  value: 612,
  unit: "ng/dL",
  reference_range: { low: 264, high: 916 },
  source: "labcorp",
  taken_at: "2026-05-15T10:30:00Z",
});

The schema lives in @platform/contracts/events/<family>.ts as a Zod schema. validateEvent(name, payload) validates at publish time. Adding a new event without registering it in EVENT_SCHEMAS fails the CI convention check.

How consumers subscribe

A service subscribes to events by declaring them in service.yaml and writing an event handler:

# services/affiliates/service.yaml
events_consumed:
  - order.placed.v1
  - payment.refunded.v1
// services/affiliates/src/events/order-placed.ts
import { EVENT_NAMES } from "@platform/contracts";
 
export const handler = createEventHandler({
  event: EVENT_NAMES.ORDER_PLACED_V1,
  async handle(payload, ctx) {
    // payload is fully typed from the Zod schema
    const commission = calculateCommission(payload);
    await ctx.affiliates.postCommission(commission);
  },
});

The convention check enforces that:

  • Every consumed event in service.yaml has a matching handler file under src/events/
  • Every handler file exports handler
  • Every consumed event exists in EVENT_SCHEMAS

At-least-once delivery + idempotency

The platform guarantees at-least-once delivery, not exactly-once. That means consumers will sometimes see the same event twice (NATS + EventBridge can both deliver; retries on failure can re-deliver).

Consumers MUST be idempotent. The standard pattern:

export const handler = createEventHandler({
  event: EVENT_NAMES.ORDER_PLACED_V1,
  async handle(payload, ctx) {
    const idempotencyKey = `commission:${payload.order_id}`;
    if (await ctx.db.idempotency.exists(idempotencyKey)) return;
 
    await ctx.db.transaction(async (tx) => {
      await ctx.affiliates.postCommission(payload, tx);
      await tx.idempotency.insert({ key: idempotencyKey, payload });
    });
  },
});

Details: Idempotency and retries.

Replay

Sometimes a consumer crashes mid-process or a downstream system needs to backfill. EventBridge supports replay from an archive: pick a time window + event filter, and EventBridge re-delivers matching events to the same targets.

The runbook for replay lives in runbooks/eventbridge-replay. The key constraint: only replay events whose consumers are idempotent. A replay on a non-idempotent consumer creates duplicate state.

When to use NATS vs EventBridge directly

You don’t choose. Publishers always go through the outbox, which fans out to both. Consumers subscribe at the application level (event handler files), and the platform routes them to the right bus depending on the deployment topology.

Inside the cluster, consumers prefer NATS for the latency. Cross-AZ or cross-account consumers (the webhook dispatcher, for example) consume from EventBridge because that’s the durable + filterable path.

What you should not do

  • Don’t publish from outside a database transaction. If your write succeeds and the publish fails, you’ve created drift. Always use the outbox pattern.
  • Don’t reuse an event name. Adding a new field is a .v2 event, not a mutation of .v1. Old subscribers keep working; new subscribers opt into the new shape.
  • Don’t rely on event ordering across services. Two events from different services can arrive out of order. Encode any required sequence into the payload (timestamps, version numbers) so consumers can reconcile.
  • Don’t subscribe to events your service.yaml doesn’t list. The convention check rejects undeclared subscriptions.
  • Don’t write side effects in the handler that aren’t idempotent. External vendor calls especially — use idempotency keys at the vendor (Stripe’s Idempotency-Key header etc.).

Webhook delivery to third-party apps

Third-party apps don’t subscribe to NATS or EventBridge directly. They subscribe via webhooks: the developer portal lets them configure a webhook URL + signing secret, and the platform delivers matching events as signed HTTP POSTs.

The webhook dispatcher is a consumer of EventBridge that:

  1. Looks up the OAuth grant (does this user × client combo still exist and include the scope this event requires?)
  2. Looks up the client’s webhook filter (does this client want this event type?)
  3. POSTs to the configured URL with an HMAC signature
  4. Retries with backoff on failure, dead-letters after the schedule expires

Details: Webhooks.

Source ADRs

ADR-0019 (events on NATS), ADR-0040 (EventBridge bus + outbox), ADR-0043 (versioned event templates), ADR-0049 (replay strategy).