Stripe Webhook Retries & Missed Events in Next.js

Stripe retries a failed webhook for up to three days, then stops [1]. If your handler is broken or your endpoint gets disabled during that window, the event is gone from the push channel for good. Signature verification and event-ID dedup don't save you here: they only protect against events Stripe redelivers, not events you never managed to receive.

That gap has a real cost. A dropped checkout.session.completed means a customer paid and never got access, which is a silent data-integrity failure, not just an ops annoyance. Getting the signature right is the first half of a reliable webhook. This post covers the second half: what happens after you verify and dedup, how Stripe's retry behavior actually works, and how to reconcile the events that slip through. For the verification layer itself, the five reasons Stripe webhook signatures fail in Next.js is the companion piece, and the Stripe payments with Server Actions guide covers the full integration from checkout to delivery.

TL;DR:

Stripe retries failed deliveries for up to three days with exponential backoff in live mode [1], then gives up. Retries also stop entirely if your endpoint is disabled or deleted, so a long outage loses events permanently.
Return a 2xx before any complex logic [1]. On Vercel and other serverless platforms you can't "return 200 and keep working": use after() for short post-acknowledgment work (it's bounded by your route's max duration [4]), or an external queue for anything slow.
Dedup on event.id is necessary but incomplete. Stripe sometimes sends two separate Event objects for the same underlying change [1], so make the fulfillment itself idempotent (upsert keyed on the payment intent), don't rely on event-ID uniqueness alone.
The backstop is reconciliation. Stripe's own guide for undelivered events uses the List Events API with delivery_success=false, but only goes back 30 days [2][3]. Run a reconciliation job on a schedule shorter than 30 days.
Events arrive out of order with no ordering guarantee [1]. Fulfill from the object's current state, not the transition the event name implies.

Why don't signature verification and dedup guarantee delivery?
How long does Stripe retry a failed webhook?
How should a Next.js webhook acknowledge fast and process later?
Why isn't dedup on event.id enough?
How do you handle out-of-order events?
How do you reconcile events Stripe never delivered?
What does a reliable Next.js webhook look like end to end?

Why don't signature verification and dedup guarantee delivery?

Verification proves an event is authentic. Dedup stops you from processing the same event twice. Neither one guarantees you process every event exactly once, because both only act on events that actually reached your handler. The events that hurt are the ones that never arrived, or arrived while your handler was throwing 500s.

There are three independent reliability problems, and they need three different defenses:

Duplicates: the same event reaches you more than once. Solved by idempotency (dedup).
Ordering: events arrive in a different order than they happened. Solved by state-based fulfillment.
Loss: an event never reaches you, or fails every retry. Solved by reconciliation.

A verified signature tells you nothing about which of these you're facing. As the duplicate-event and replay idempotency pattern explains, a valid signature does not make an event unique, and it certainly doesn't make it present. The verification deep-dive handles authenticity; the rest of this post handles delivery.

How long does Stripe retry a failed webhook?

Stripe attempts delivery for up to three days with exponential backoff in live mode [1]. Sandbox events are retried only "three times over the course of a few hours" [1]. Crucially, if your destination "has been disabled or deleted" when Stripe attempts a retry, Stripe "prevents future retries of that event" [1]. After the window closes, the event is gone from the push channel.

This is the failure mode most indie launches never plan for. Your endpoint works in testing, so you assume delivery is guaranteed. Then a bad deploy returns 500 for an hour, or your project auto-disables the endpoint after a run of failures, or your database is mid-migration when the payment lands. Stripe keeps retrying for three days, but if the outage outlasts the window (or the endpoint is disabled), the checkout.session.completed that confirms a real payment quietly disappears. The customer paid; your system never recorded it.

Two takeaways follow directly. First, your handler should fail in a way that lets Stripe's retries help you (return a non-2xx so the event is retried, not a 200 that tells Stripe "handled, stop"). Second, three days of retries is not a safety net you can lean on alone, because outages and disabled endpoints defeat it. You need a second channel that doesn't depend on the push delivery succeeding at all, which is reconciliation, covered below.

How should a Next.js webhook acknowledge fast and process later?

Stripe is explicit: "quickly return a successful status code (2xx) prior to any complex logic that could cause a timeout" [1], and "configure your handler to process incoming events with an asynchronous queue" [1]. The current webhooks documentation doesn't publish a fixed timeout in seconds; the rule is simply to acknowledge before doing slow work, then process out of band.

The catch on serverless is that "return 200 and keep working" doesn't behave the way it does on a long-lived Node server. Once the function sends its response, the platform is free to freeze or recycle the invocation, and your trailing work may never finish. Next.js gives you after() for this: it "schedule[s] work to be executed after a response is finished" and is meant for "side effects that should not block the response, such as logging and analytics" [4]. But it isn't a durable queue. Its callback "will run for the platform's default or configured max duration of your route" [4], so it's bounded by the same time budget as the request and tied to the same invocation.

That gives you a clear decision rule:

Fast, idempotent fulfillment (a few quick database writes): process inline before returning 2xx. This is what the handler below does, and it's the right call when the work is milliseconds, not seconds.
Short non-critical side effects (fire an email, enqueue a job): wrap them in after() [4] so a slow email provider can't blow your response budget, while keeping the work on the same request.
Slow or multi-step processing (provisioning, third-party calls, fan-out): hand the event to a real queue or a database-backed outbox and acknowledge immediately. Drain the queue with a separate worker or a Server Action-driven job. after() is not durable enough for this tier.

Picking the wrong tier is how you get timeouts that Stripe reads as failures, which triggers retries, which (without idempotency) triggers duplicate fulfillment. The tiers above keep the acknowledgment fast without pretending serverless functions live forever.

Why isn't dedup on event.id enough?

Because event.id uniqueness has a documented exception. Stripe's guidance is to guard against duplicates "by logging the event IDs you've processed," but it adds: "in some cases, two separate Event objects are generated and sent. To identify these duplicates, use the ID of the object in data.object along with the event.type" [1]. Two different event.id values can describe the same underlying change, so an event-ID-only check lets one of them through.

The robust answer is to make fulfillment itself idempotent rather than leaning entirely on the dedup key. The handler claims the event.id first (so ordinary retries short-circuit), but the actual write is an upsert keyed on the payment intent, which means even a second Event object for the same payment can't create a duplicate purchase:

case 'checkout.session.completed': {
  const session = event.data.object as Stripe.Checkout.Session
  if (session.mode === 'payment') {
    // Keyed on the payment intent, not event.id: a replay, a retry, or a
    // second Event object for the same payment all resolve to one row.
    await admin.from('purchases').upsert({
      id: (session.payment_intent as string) || session.id,
      user_id: session.metadata?.user_id ?? '',
      product_id: session.metadata?.product_id || 'securestartkit_template',
      amount: session.amount_total || 0,
      status: 'completed',
    })
  }
  break
}

The principle generalizes: dedup on event.id to avoid redundant work, but make the side effect idempotent on a business key (payment intent, order ID, subscription ID) so correctness never depends on the dedup check being perfect. The billing architecture deep-dive covers how this scales when one customer generates dozens of events a year instead of a single one-time payment.

How do you handle out-of-order events?

You design fulfillment so order doesn't matter. Stripe "doesn't guarantee the delivery of events in the order that they're generated" and tells you to "make sure that your event destination isn't dependent on receiving events in a specific order" [1]. So you can't treat customer.subscription.updated as "the step after created" or assume a refund event lands after the charge it refunds.

The fix is to fulfill from the object's current state, not the transition the event name implies. When an event arrives, read the relevant field on event.data.object (the subscription's status, the session's payment_status) and set your record to match, rather than applying a delta like "increment" or "flip the flag." A late or out-of-order event then converges to the same correct state instead of corrupting it. Stripe also notes you "can use the API to retrieve any missing objects" [1] when you need the authoritative current value, which is the same capability the reconciliation job below relies on.

How do you reconcile events Stripe never delivered?

You poll the List Events API on a schedule and process anything your database is missing. This is Stripe's own documented recovery path. Its "process undelivered events" guide explains that when an endpoint "temporarily can't process events," you can speed up recovery by calling the List Events API with delivery_success=false and a types[] filter [2]. The List Events endpoint is GET /v1/events, and it returns events "going back up to 30 days" [3].

That 30-day retention is the load-bearing number. Stripe's push retries last three days; the Events API window is 30. So a reconciliation job that runs well inside 30 days catches every event the push channel dropped, even an outage that outlasted the three-day retry window. Run it daily or hourly, never monthly. A minimal version as a scheduled Route Handler:

import { getStripe } from '@/lib/stripe/client'
import { createAdminClient } from '@/lib/supabase/server'

// Hit this from a cron (e.g. a scheduled job) on a cadence well under 30 days.
export async function GET() {
  const stripe = getStripe()
  const admin = createAdminClient()

  // Ask Stripe only for the events it could not deliver successfully.
  const events = await stripe.events.list({
    types: ['checkout.session.completed', 'invoice.payment_failed'],
    delivery_success: false,
    limit: 100,
  })

  for (const event of events.data) {
    // Same idempotency claim the live handler uses: skip anything already done.
    const { error: claimError } = await admin
      .from('stripe_events')
      .insert({ id: event.id, type: event.type })

    if (claimError?.code === '23505') continue // already processed
    if (claimError) continue // record failed; let the next run retry

    await handleEvent(event) // the same processing the webhook route runs
  }

  return Response.json({ reconciled: events.data.length })
}

Stripe's guide pairs this with a status-tracked table: functions to check whether an event "is_processing_or_processed," to "mark_as_processing," and to "mark_as_processed," plus the rule that "when your webhook endpoint receives an already processed event, ignore the event and return a successful response to stop future retries" [2]. The dedup table that the live handler writes to is exactly that status record, which is why the reconciliation job can reuse it: the live handler and the cron share one source of truth about what's been processed.

One limit to plan for: beyond 30 days, the Events API can't help, because the events are gone. For that tail you reconcile against the underlying objects instead, listing Checkout Sessions or PaymentIntents directly and comparing them to your purchases table. In practice a job that runs at least weekly never reaches that case.

What does a reliable Next.js webhook look like end to end?

A reliable handler verifies the signature, claims the event before doing work, makes the side effect idempotent on a business key, isolates non-critical work so it can't trigger false retries, and releases its claim when processing genuinely fails so Stripe can retry. A reconciliation cron sits behind all of it as the delivery backstop. Here is the shape of the SecureStartKit handler, trimmed to the one-time payment path:

export async function POST(request: Request) {
  const body = await request.text()
  const headersList = await headers()
  const sig = headersList.get('stripe-signature')
  if (!sig) return NextResponse.json({ error: 'No signature' }, { status: 400 })

  let event: Stripe.Event
  try {
    event = getStripe().webhooks.constructEvent(
      body, sig, process.env.STRIPE_WEBHOOK_SECRET!
    )
  } catch (err) {
    return NextResponse.json({ error: 'Invalid signature' }, { status: 400 })
  }

  // Acknowledge-and-ignore events you don't act on, fast.
  if (!relevantEvents.has(event.type)) {
    return NextResponse.json({ received: true })
  }

  const admin = createAdminClient()

  // Claim the event before any work. A duplicate is acknowledged with 200 so
  // Stripe stops retrying; a record failure returns 500 so Stripe keeps trying.
  const { error: claimError } = await admin
    .from('stripe_events')
    .insert({ id: event.id, type: event.type })
  if (claimError?.code === '23505') {
    return NextResponse.json({ received: true })
  }
  if (claimError) {
    return NextResponse.json({ error: 'Failed to record event' }, { status: 500 })
  }

  try {
    if (event.type === 'checkout.session.completed') {
      const session = event.data.object as Stripe.Checkout.Session
      if (session.mode === 'payment') {
        // Idempotent on the payment intent: retries can't double-write.
        await admin.from('purchases').upsert({ /* ...keyed on payment_intent */ })

        // Best-effort: an email provider failure must NOT fail the webhook and
        // trigger a retry, because the purchase is already recorded above.
        try {
          await sendPurchaseDeliveryEmail(/* ... */)
        } catch (emailError) {
          console.error('Purchase email failed:', emailError)
        }
      }
    }
    return NextResponse.json({ received: true })
  } catch (error) {
    // Release the claim so Stripe's retry can re-process this event.
    await admin.from('stripe_events').delete().eq('id', event.id)
    return NextResponse.json({ error: 'Webhook handler failed' }, { status: 500 })
  }
}

Three details carry the reliability load. The claim-before-work insert turns ordinary duplicates into a cheap 200. The best-effort side effect is the subtle one: the delivery email is wrapped in its own try/catch so a Resend outage logs an error but doesn't fail the webhook, because the purchase is already saved and a retry would only resend the email. And the release-on-failure delete in the catch block is what lets Stripe's three-day retry actually help: if real processing throws, the claim is removed and the next retry re-runs cleanly, instead of a dead claim row swallowing the event forever.

That release-on-failure pattern is the simple, correct choice when fulfillment is fast and idempotent. If your processing grows slow or multi-step, graduate to the status-column model from Stripe's reconciliation guide (received then processed) and an async queue, so a partial failure leaves a received row the cron can finish later. SecureStartKit ships the inline version because one-time fulfillment is a single idempotent upsert; the upgrade path is documented for when your event volume isn't.

Reliable payment webhooks are an integrity guarantee, not a nice-to-have, which is why the pre-launch security audit treats signed, idempotent, reconciled webhooks as one checklist item rather than three. Before you flip to live keys, run a known-bad signature through the Stripe Webhook Verifier to confirm the verification half, then add the reconciliation cron so the delivery half can survive a bad deploy. The handler that records a purchase runs on the same service-role admin client as the rest of the backend, never the browser, so the payments path stays on the server where it belongs.

TL;DR:

Stripe retries failed deliveries for up to three days with exponential backoff in live mode [1], then gives up. Retries also stop entirely if your endpoint is disabled or deleted, so a long outage loses events permanently.
Return a 2xx before any complex logic [1]. On Vercel and other serverless platforms you can't "return 200 and keep working": use after() for short post-acknowledgment work (it's bounded by your route's max duration [4]), or an external queue for anything slow.
Dedup on event.id is necessary but incomplete. Stripe sometimes sends two separate Event objects for the same underlying change [1], so make the fulfillment itself idempotent (upsert keyed on the payment intent), don't rely on event-ID uniqueness alone.
The backstop is reconciliation. Stripe's own guide for undelivered events uses the List Events API with delivery_success=false, but only goes back 30 days [2][3]. Run a reconciliation job on a schedule shorter than 30 days.
Events arrive out of order with no ordering guarantee [1]. Fulfill from the object's current state, not the transition the event name implies.

Why don't signature verification and dedup guarantee delivery?
How long does Stripe retry a failed webhook?
How should a Next.js webhook acknowledge fast and process later?
Why isn't dedup on event.id enough?
How do you handle out-of-order events?
How do you reconcile events Stripe never delivered?
What does a reliable Next.js webhook look like end to end?

Why don't signature verification and dedup guarantee delivery?

There are three independent reliability problems, and they need three different defenses:

Duplicates: the same event reaches you more than once. Solved by idempotency (dedup).
Ordering: events arrive in a different order than they happened. Solved by state-based fulfillment.
Loss: an event never reaches you, or fails every retry. Solved by reconciliation.

How long does Stripe retry a failed webhook?

How should a Next.js webhook acknowledge fast and process later?

That gives you a clear decision rule:

Fast, idempotent fulfillment (a few quick database writes): process inline before returning 2xx. This is what the handler below does, and it's the right call when the work is milliseconds, not seconds.
Short non-critical side effects (fire an email, enqueue a job): wrap them in after() [4] so a slow email provider can't blow your response budget, while keeping the work on the same request.
Slow or multi-step processing (provisioning, third-party calls, fan-out): hand the event to a real queue or a database-backed outbox and acknowledge immediately. Drain the queue with a separate worker or a Server Action-driven job. after() is not durable enough for this tier.

Why isn't dedup on event.id enough?

case 'checkout.session.completed': {
  const session = event.data.object as Stripe.Checkout.Session
  if (session.mode === 'payment') {
    // Keyed on the payment intent, not event.id: a replay, a retry, or a
    // second Event object for the same payment all resolve to one row.
    await admin.from('purchases').upsert({
      id: (session.payment_intent as string) || session.id,
      user_id: session.metadata?.user_id ?? '',
      product_id: session.metadata?.product_id || 'securestartkit_template',
      amount: session.amount_total || 0,
      status: 'completed',
    })
  }
  break
}

How do you handle out-of-order events?

How do you reconcile events Stripe never delivered?

import { getStripe } from '@/lib/stripe/client'
import { createAdminClient } from '@/lib/supabase/server'

// Hit this from a cron (e.g. a scheduled job) on a cadence well under 30 days.
export async function GET() {
  const stripe = getStripe()
  const admin = createAdminClient()

  // Ask Stripe only for the events it could not deliver successfully.
  const events = await stripe.events.list({
    types: ['checkout.session.completed', 'invoice.payment_failed'],
    delivery_success: false,
    limit: 100,
  })

  for (const event of events.data) {
    // Same idempotency claim the live handler uses: skip anything already done.
    const { error: claimError } = await admin
      .from('stripe_events')
      .insert({ id: event.id, type: event.type })

    if (claimError?.code === '23505') continue // already processed
    if (claimError) continue // record failed; let the next run retry

    await handleEvent(event) // the same processing the webhook route runs
  }

  return Response.json({ reconciled: events.data.length })
}

What does a reliable Next.js webhook look like end to end?

export async function POST(request: Request) {
  const body = await request.text()
  const headersList = await headers()
  const sig = headersList.get('stripe-signature')
  if (!sig) return NextResponse.json({ error: 'No signature' }, { status: 400 })

  let event: Stripe.Event
  try {
    event = getStripe().webhooks.constructEvent(
      body, sig, process.env.STRIPE_WEBHOOK_SECRET!
    )
  } catch (err) {
    return NextResponse.json({ error: 'Invalid signature' }, { status: 400 })
  }

  // Acknowledge-and-ignore events you don't act on, fast.
  if (!relevantEvents.has(event.type)) {
    return NextResponse.json({ received: true })
  }

  const admin = createAdminClient()

  // Claim the event before any work. A duplicate is acknowledged with 200 so
  // Stripe stops retrying; a record failure returns 500 so Stripe keeps trying.
  const { error: claimError } = await admin
    .from('stripe_events')
    .insert({ id: event.id, type: event.type })
  if (claimError?.code === '23505') {
    return NextResponse.json({ received: true })
  }
  if (claimError) {
    return NextResponse.json({ error: 'Failed to record event' }, { status: 500 })
  }

  try {
    if (event.type === 'checkout.session.completed') {
      const session = event.data.object as Stripe.Checkout.Session
      if (session.mode === 'payment') {
        // Idempotent on the payment intent: retries can't double-write.
        await admin.from('purchases').upsert({ /* ...keyed on payment_intent */ })

        // Best-effort: an email provider failure must NOT fail the webhook and
        // trigger a retry, because the purchase is already recorded above.
        try {
          await sendPurchaseDeliveryEmail(/* ... */)
        } catch (emailError) {
          console.error('Purchase email failed:', emailError)
        }
      }
    }
    return NextResponse.json({ received: true })
  } catch (error) {
    // Release the claim so Stripe's retry can re-process this event.
    await admin.from('stripe_events').delete().eq('id', event.id)
    return NextResponse.json({ error: 'Webhook handler failed' }, { status: 500 })
  }
}

Stripe Webhook Retries & Missed Events in Next.js

Table of contents

Why don't signature verification and dedup guarantee delivery?

How long does Stripe retry a failed webhook?

How should a Next.js webhook acknowledge fast and process later?

Why isn't dedup on event.id enough?

How do you handle out-of-order events?

How do you reconcile events Stripe never delivered?

What does a reliable Next.js webhook look like end to end?

References

Related Posts

Stripe Webhook Signature in Next.js: 5 Failure Modes [2026]

Stripe Billing Architecture: 6 Mechanical Diffs [2026]

Rotate Leaked API Keys Without Downtime [2026]

Stripe Webhook Retries & Missed Events in Next.js

Table of contents

Why don't signature verification and dedup guarantee delivery?

How long does Stripe retry a failed webhook?

How should a Next.js webhook acknowledge fast and process later?

Why isn't dedup on event.id enough?

How do you handle out-of-order events?

How do you reconcile events Stripe never delivered?

What does a reliable Next.js webhook look like end to end?

References

Related Posts

Stripe Webhook Signature in Next.js: 5 Failure Modes [2026]

Stripe Billing Architecture: 6 Mechanical Diffs [2026]

Rotate Leaked API Keys Without Downtime [2026]