5 Production Rate-Limit Failure Modes in Next.js [2026]

Production rate limiting on Next.js Server Actions fails in five mechanically distinct ways that most implementation guides don't name. The shipped limiter passes code review, holds up in local testing, and silently lets traffic through anyway: IP extraction that works on Vercel and breaks the moment you self-host, fixed-window counters that allow 2x the limit across a 1-second window edge, per-IP keys defeated by distributed scrapers, ratelimit.limit() calls without await that become no-ops, and an Upstash bill that grows with every public-endpoint hit because each check costs 3 to 5 Redis commands.

This is the production-failure layer the rate limiting Server Actions guide sets the foundation for. That guide covers the in-memory cold-start problem and the shared-vs-caller-key choice; this post covers the five failure modes that show up on the upgrade path from "we have a rate limiter" to "the rate limiter actually limits." If you have not added one at all yet, the starting point is why an unthrottled Server Action is a vulnerability.

TL;DR:

x-forwarded-for parsing is deployment-surface-specific. Vercel overwrites the header to prevent spoofing [1]; self-hosted Next.js (Docker, Kubernetes, bare Node behind nginx) trusts whatever the client sends unless you configure a proxy chain. The same .split(',')[0] code that's safe on Vercel hands attackers a controlled identifier on self-host.
Fixed-window limiters let through 2x the limit across the window boundary. Upstash's own algorithm docs name this verbatim: "Can cause high bursts at the window boundaries to leak through" [2]. The sliding-window upgrade costs ~67% more Redis commands per call (5 vs 3) [3]; token bucket sits between them at 4. Pick deliberately, not by default.
Per-IP rate limits are nearly free for distributed scrapers to defeat. Residential proxy networks rent millions of unique IPs by the day; IPv6 /64 allocations give one attacker effectively unlimited source addresses. A per-IP limit at 5/min becomes meaningless when each request comes from a different IP. The bot protection layer classifies clients before they reach the rate limiter.
ratelimit.limit() without await returns a Promise that nobody waits on. The check never runs to completion before the Server Action proceeds. TypeScript flags this only with noFloatingPromises enabled in ESLint; without it, the code compiles, runs, and silently lets everything through.
The rate limiter itself bills per call. Sliding window: 5 Upstash commands per fresh call. Fixed window: 3. Token bucket: 4 [3]. A public endpoint at 100 requests per second on sliding window means 500 Redis commands per second. The ephemeral cache short-circuits to 0 commands for already-blocked identifiers but never covers fresh ones.

What's actually wrong with a typical Server Action rate limiter?
Why does x-forwarded-for parsing break when you leave Vercel?
How does a fixed-window limiter let through 2x the limit in one second?
Why don't per-IP limits stop a distributed scraper?
What happens when you forget to await ratelimit.limit()?
How much does the rate limiter itself cost on a public endpoint?
What does a hardened Server Action rate limiter look like end-to-end?
What does SecureStartKit ship today, and what should you add?

What's actually wrong with a typical Server Action rate limiter?

A typical Next.js rate limiter passes code review because each component looks correct in isolation. The Upstash client is initialized once at module scope. await ratelimit.limit() runs before the business logic. The key includes the action name and an identifier. The failure modes show up on the seams between those components, not in any one of them.

Here's the shape most production limiters take after a teammate reads that guide and ships the upgrade:

// lib/rate-limit.ts
import { Ratelimit } from '@upstash/ratelimit'
import { Redis } from '@upstash/redis'

export const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.fixedWindow(5, '60 s'),
  prefix: 'rl',
})

// actions/auth.ts
import { headers } from 'next/headers'
import { ratelimit } from '@/lib/rate-limit'

export async function login(data: LoginInput) {
  const headerList = await headers()
  const ip = (headerList.get('x-forwarded-for') ?? '127.0.0.1').split(',')[0]
  const { success } = await ratelimit.limit(`login:${ip}`)
  if (!success) return { error: 'Too many attempts.' }
  // ...
}

This code is in the Next.js docs and the Upstash quickstart. It works on Vercel during local testing with a single browser. Five things go wrong with it in production:

The (headerList.get('x-forwarded-for') ?? '127.0.0.1').split(',')[0] line assumes Vercel's deployment surface. On self-hosted Next.js the same line trusts client-controlled bytes.
Ratelimit.fixedWindow(5, '60 s') allows 10 requests in a 2-second window if the attacker times the boundary. Upstash documents the behavior; the docs example uses sliding window for this exact reason.
The per-IP key is useless against a scraper that rotates through 500 IPs. Each request gets its own bucket of 5, so the effective limit is 5 * 500 = 2500 requests per minute.
await is doing real work in this code, and removing it accidentally is a one-character mistake that turns the limiter into a no-op.
The Upstash call costs 3 Redis commands per fresh hit on fixed window, 5 on sliding window. On a high-traffic public endpoint, the rate limiter becomes a meaningful line item on the Upstash invoice.

The next sections cover each failure in depth with the primary-source evidence and the concrete fix.

Why does x-forwarded-for parsing break when you leave Vercel?

The (headerList.get('x-forwarded-for') ?? '127.0.0.1').split(',')[0] pattern is safe on Vercel because Vercel overwrites the header before your function sees it. On self-hosted Next.js the same code reads whatever the client sent, because most reverse-proxy configs forward client-supplied headers untouched. The deployment surface determines whether x-forwarded-for is an authenticated identifier or a controlled input.

Vercel's documentation is explicit about the overwrite. From the request headers reference, verbatim: "If you are trying to use Vercel behind a proxy, we currently overwrite the X-Forwarded-For header and do not forward external IPs. This restriction is in place to prevent IP spoofing" [1]. That single behavior is what makes the .split(',')[0] pattern work on Vercel: the first element of the chain is the actual client IP because Vercel set it from the TCP connection, not because the client sent something honest.

When you self-host, that guarantee disappears. The classic case: a Next.js app deployed on a VPS with nginx in front. If nginx.conf includes proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; and nothing else, nginx appends the connection IP to whatever the client sent. A request with X-Forwarded-For: 1.2.3.4 becomes X-Forwarded-For: 1.2.3.4, <client-ip> in your Next.js handler. Your .split(',')[0] returns 1.2.3.4, the value the attacker chose. Every rate-limit bucket the attacker wants to drain, they drain.

This is CWE-290, Authentication Bypass by Spoofing [6]. The CWE definition: "This attack-focused weakness is caused by improperly implemented authentication schemes that are subject to spoofing attacks." Rate-limit identifiers are an authentication-adjacent control: they answer "who is making this call" before any auth check runs. Spoof the identifier and you defeat the control.

The fix depends on your deployment surface, not your code:

On Vercel directly: headerList.get('x-forwarded-for') is trustworthy. The first comma-separated element is the client IP. The current pattern is correct.
On Vercel behind Cloudflare or another proxy: trust the right hop. cf-connecting-ip is Cloudflare's authoritative client-IP header; prefer it over x-forwarded-for when both are present. Vercel's docs flag this case: "Enterprise customers can purchase and enable a trusted proxy to allow your custom X-Forwarded-For IP" [1]. For non-Enterprise, the safer pattern is to skip XFF entirely and use the Cloudflare header.
On self-hosted Next.js (Docker, Kubernetes, bare Node behind nginx): configure the proxy to strip incoming X-Forwarded-For and write only the connection IP. In nginx: proxy_set_header X-Forwarded-For $remote_addr; (the $proxy_add_x_forwarded_for variant appends and trusts). Then your Next.js code can trust the header again, because the proxy guaranteed it.

The deployment-surface-aware extractor:

// lib/get-client-ip.ts
import { headers } from 'next/headers'

export async function getClientIp(): Promise<string> {
  const h = await headers()

  // Cloudflare in front: trust cf-connecting-ip, ignore x-forwarded-for
  const cf = h.get('cf-connecting-ip')
  if (cf) return cf

  // Vercel direct: x-forwarded-for is overwritten, first element is client
  if (process.env.VERCEL) {
    const xff = h.get('x-forwarded-for')
    if (xff) return xff.split(',')[0].trim()
  }

  // Self-hosted: only trust XFF if the proxy stripped incoming values
  // Otherwise fall through to a fixed identifier that breaks per-IP limits
  // loudly rather than silently
  return h.get('x-forwarded-for')?.split(',')[0].trim() ?? '0.0.0.0'
}

The 0.0.0.0 fallback is intentionally loud. If your rate-limit logs show every request from 0.0.0.0, the extractor is misconfigured for the deployment surface. Silent fallback to 127.0.0.1 like the main guide's example would let everything in under one bucket without surfacing the misconfiguration.

How does a fixed-window limiter let through 2x the limit in one second?

A fixed-window limiter allows the configured maximum twice in rapid succession when an attacker times requests around the window boundary. With fixedWindow(5, '60 s'), an attacker sends 5 requests at second 59 of one window and 5 more at second 1 of the next window. From the limiter's point of view, each window is under-limit. From your endpoint's point of view, 10 requests landed in roughly 2 seconds. The fix is the algorithm choice, and the algorithm choice has a cost.

Upstash's own algorithm documentation names the failure verbatim. Under "Fixed Window," the cons section reads: "Can cause high bursts at the window boundaries to leak through" and "Causes request stampedes if many users are trying to access your server, whenever a new window begins" [2]. The mechanism is the algorithm itself, not a bug. Fixed-window counters reset to zero on a fixed schedule; the schedule is exactly the attack surface.

The sliding-window algorithm fixes this by computing the limit over a rolling period weighted across the previous and current windows. Upstash documents the trade-off, also verbatim: "More expensive in terms of storage and computation" and "Is only an approximation, because it assumes a uniform request flow in the previous window" [2]. The approximation is fine for rate limiting; the cost is the real consideration.

Token bucket is the third option. From the docs: "Bursts of requests are smoothed out and you can process them at a constant rate" with the trade-off "Expensive in terms of computation" [2]. The semantics differ: token bucket allows controlled bursts up to the bucket capacity, then refills at a steady rate. Useful for APIs where a client legitimately makes a burst of requests at session start and steady traffic thereafter.

The cost difference is concrete. Upstash's costs documentation publishes the per-call Redis-command counts [3]:

Algorithm	First call	Intermediate hit/miss	Rate-limited (no cache)	Rate-limited (cache hit)
Fixed window	3 (EVAL, INCR, PEXPIRE)	2 (EVAL, INCR)	2 (EVAL, INCR)	0 (cached)
Sliding window	5 (EVAL, GET, GET, INCR, PEXPIRE)	4 (EVAL, GET, GET, INCR)	3 (EVAL, GET, GET)	0 (cached)
Token bucket	4 (EVAL, HMGET, HSET, PEXPIRE)	4 (EVAL, HMGET, HSET, PEXPIRE)	2 (EVAL, HMGET)	0 (cached)

Switching from fixed window to sliding window is roughly a 67% increase in commands per call on first hits and a 100% increase on intermediate calls (4 vs 2). On a Server Action that runs 10 times per second across your active users, that's the difference between 20 commands/s and 40 commands/s of constant Upstash load. Whether that matters depends on your traffic; the point is to choose deliberately.

The pragmatic default: sliding window for auth actions (the burst-at-boundary attack is real and the limits are small enough that the command-cost increase is rounding error). Token bucket for actions where users legitimately burst (the "save 20 items at once" pattern). Fixed window stays in scope only for internal admin endpoints where the boundary attack is operationally implausible.

The configuration change is one line:

// before
limiter: Ratelimit.fixedWindow(5, '60 s'),

// after, for auth
limiter: Ratelimit.slidingWindow(5, '60 s'),

// after, for bursty user actions
limiter: Ratelimit.tokenBucket(10, '60 s', 20), // refill 10/min, capacity 20

The change is invisible to your action code; the ratelimit.limit() API is the same. The semantics are different and the bill is different.

Why don't per-IP limits stop a distributed scraper?

Per-IP rate limits are a defense against repeated abuse from a single identifier. They are not a defense against unknown identifiers. A modern scraper rotates through residential proxy networks or IPv6 allocations to make every request appear to come from a different IP. From the limiter's perspective each request is the first request from that IP, which is exactly the case the limiter is built to allow.

The economics make this trivial. Residential proxy services rent pools of millions of real consumer IPs that scrapers route through; pricing runs at $5 to $15 per gigabyte of egress, well within reach of any motivated scraper. For IPv6, a single ISP-assigned /64 allocation contains 18.4 quintillion addresses (2^64); an attacker on a residential IPv6 connection can effectively cycle through unlimited source IPs without paying anyone. The per-IP bucket the limiter creates for each new IP defeats itself.

The bot protection and DDoS mitigation guide calls out this exact failure: "Rate limiting is a defense against repeated abuse from a known identifier; it is not a defense against unknown identifiers." That post covers the upstream layer (Vercel BotID, WAF rules, Cloudflare classification) that classifies automated clients before the rate limiter ever sees them.

There's also a layer of mitigation inside the rate limiter itself, for cases where the bot-protection layer is overkill (small project) or unavailable (the route is one you cannot put behind BotID's enforcement, like a public RSS or /sitemap.xml). Three options that change which dimension the limiter measures:

Per-route global ceiling. Add a second limiter that fires on the action name without an identifier. If login total requests across all callers exceeds 1000 per minute, return rate-limited globally. Legitimate traffic at small scale stays well under; a distributed scraper hammering the action hits the ceiling regardless of IP rotation. The risk is genuine spike traffic (a launch tweet) hitting the ceiling and locking out real users. Use this for actions where total-volume cap is acceptable as a backstop, not as the only limit.

const perIpLimit = await ratelimit.limit(`login:${ip}`)
const globalLimit = await globalRatelimit.limit('login:global')
if (!perIpLimit.success || !globalLimit.success) {
  return { error: 'Too many attempts. Please try again later.' }
}

Per-session token. For multi-step flows (signup wizard, checkout), issue a server-signed token on step one and validate it on subsequent steps. Bind the rate limit to the token instead of the IP. A scraper rotating through IPs gets a new token per IP rotation, and the token's TTL (5 minutes) caps how many fresh tokens any IP can mint per window.

ASN clustering. Residential proxies typically resolve to a small set of Autonomous System Numbers (the ISP-level identifier). The MaxMind GeoLite2-ASN database is free and gives you the ASN for any IP. If 500 "different" IPs all belong to ASN AS62000 (a known residential proxy network), treat them as one bucket. The risk is false positives on shared corporate ASNs; weight ASN as a signal, not a hard block.

None of these are sufficient on their own against a sophisticated attacker. They raise the cost and the detection surface enough that the unsophisticated attacker moves on and the sophisticated one shows up in your logs.

What happens when you forget to await ratelimit.limit()?

Forgetting the await on ratelimit.limit() makes the limiter a no-op. The function returns a Promise; without await the Promise floats off and the calling code proceeds immediately to the business logic with success undefined and the destructured check passing as truthy by coincidence. The Upstash request still hits Redis (eventually) but the result never gates the action.

The code that produces the bug:

// broken: no await, no error
const { success } = ratelimit.limit(`login:${ip}`) // returns Promise, not Result
if (!success) return { error: 'Too many attempts.' }
// success is undefined; !undefined is true... but actually !success is true,
// so this *blocks all requests* if you destructure from the Promise object,
// OR allows all if your code uses ?.success which is undefined.

In practice, two variants ship to production. The destructuring variant blocks every request (the field doesn't exist on the Promise object). The optional-chaining variant allows every request. Either way the limiter is broken, and the failure mode depends on which keystroke the writer dropped.

The Upstash documentation only shows the correct pattern with await:

// correct: await yields the resolved Result
const { success } = await ratelimit.limit('api')
if (!success) return { error: 'Too many attempts.' }

The Getting Started page demonstrates this exact form across every example [5]. The library has no synchronous version because the Redis call is over HTTP and unavoidably async. There is no API surface for the bug to be a feature.

TypeScript's noFloatingPromises ESLint rule catches it at lint time. Without the rule enabled, the missing await compiles cleanly and ships. Add the rule to .eslintrc:

{
  "parser": "@typescript-eslint/parser",
  "parserOptions": { "project": "./tsconfig.json" },
  "plugins": ["@typescript-eslint"],
  "rules": {
    "@typescript-eslint/no-floating-promises": "error"
  }
}

The no-floating-promises rule requires the type-aware parser config (the parserOptions.project field). Lint runs become slower because the rule walks the type graph for every Promise expression, but the slowdown is the cost of the safety net.

The runtime fallback, for codebases not ready to adopt the lint rule yet: type-narrow the return at the import site so destructuring the Promise raises a TypeScript error.

// lib/rate-limit.ts
import { Ratelimit } from '@upstash/ratelimit'
import { Redis } from '@upstash/redis'

const upstashRatelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(5, '60 s'),
  prefix: 'rl',
})

// Re-export with an explicit Promise-returning signature
export const ratelimit = {
  limit: (id: string): Promise<{ success: boolean; remaining: number }> =>
    upstashRatelimit.limit(id),
}

const { success } = ratelimit.limit('login') then errors at compile time because success doesn't exist on Promise<{ success: boolean; remaining: number }>. The TypeScript error message is "Property 'success' does not exist on type 'Promise<RateLimitResult>'", which is the prompt the writer needs to add the await.

How much does the rate limiter itself cost on a public endpoint?

The rate limiter costs 3 to 5 Redis commands per fresh call, depending on algorithm. On a public endpoint at 100 requests per second on the default sliding-window configuration, that's 500 commands per second of constant load on Upstash, billed at Upstash's per-command pricing. The ephemeral cache short-circuits to 0 commands for already-blocked identifiers but does not help with fresh ones. The rate limiter is a real line item, not a free safety net.

The per-call cost table from the Upstash docs, repeated for emphasis [3]:

Fixed window: 3 commands first call, 2 commands intermediate
Sliding window: 5 commands first call, 4 commands intermediate
Token bucket: 4 commands per call (all variants)
Rate-limited with cache hit: 0 commands (any algorithm)

The 0-commands case is the ephemeral cache. From the Upstash features docs, verbatim: "the ratelimiter will keep track of the blocked identifiers and their reset timestamps. When a request is received with some identifier ip1 before the reset time of ip1, the request will be denied without having to call Redis." With a critical caveat: "In serverless environments this is only possible if you create the cache or ratelimiter instance outside of your handler function. While the function is still hot, the ratelimiter can block requests without having to request data from Redis" [4].

Two practical consequences:

The cache helps with attacks, not with legitimate traffic. Attackers who hit the limit are cached locally and stop calling Redis. Legitimate users with unique identifiers cost full price every call. A /contact form with 1000 unique submitters per day costs 5000 commands per day (on sliding window). The cost scales with legitimate uniqueness, not with abuse.

The cache only works while the function is hot. Serverless cold starts wipe the in-memory Map (ephemeralCache: new Map() is the default per the Upstash docs [4]). Each cold start re-discovers the same blocked identifiers and re-bills you for the first call to each. On Vercel Functions with aggressive cold starts, the ephemeral cache hit rate is meaningfully lower than the docs imply.

The cost-attribution view, applied to common patterns:

Endpoint pattern	RPS	Algo	Commands/s	Commands/day
`/login` form	0.1	sliding	0.5	43,200
`/contact` form	0.05	sliding	0.25	21,600
`/api/og` (public OG image)	50	sliding	250	21,600,000
`/blog/*` (per-IP rate limit)	100	sliding	500	43,200,000

The auth endpoints are nothing. The public, high-volume endpoints are most of the bill. Upstash's free tier (10,000 commands per day at time of writing) covers the auth endpoints cleanly and gets eaten in 30 minutes by /blog/* per-IP limiting at moderate traffic.

The mitigation menu:

Drop the rate limit on routes that don't need it. A static blog post doesn't need per-IP rate limiting; the bot protection layer at the WAF layer is the right tool for content scraping defense. The Vercel WAF rule with the maintained UA catalog ships the actual configuration with per-route patterns for /blog/* (allow search crawlers, deny training) and /tools/* (deny both).
Use fixed window where the burst-at-boundary risk is acceptable. Internal admin actions or low-stakes flows can save 40% of commands per call.
Coarse identifiers for cheap routes. Per-ASN or per-country rate limiting for public assets costs the same per call as per-IP but creates fewer unique buckets, increasing cache hit rates.
Eat the cost as the price of the control. For auth actions, the commands-per-day is small enough that the answer is "stop optimizing the rate limiter and ship the feature."

The cost angle isn't an argument against rate limiting; it's an argument for measuring it. Pull the Upstash dashboard's per-day command graph once a week. If a single endpoint dominates the chart, decide whether it needs the limiter at all.

What does a hardened Server Action rate limiter look like end-to-end?

A hardened Server Action rate limiter combines the deployment-surface-aware IP extractor, sliding-window algorithm with the right limit per action, both per-caller and per-action-global limits, an explicit await, and a typed wrapper that catches missing awaits at compile time. The code is roughly twice as long as the main guide's "putting it together" example, and roughly twice as resistant to silent failure.

The complete lib/rate-limit.ts:

// lib/rate-limit.ts
import 'server-only'
import { Ratelimit } from '@upstash/ratelimit'
import { Redis } from '@upstash/redis'

const redis = Redis.fromEnv()

// Per-caller limiter: tracks individual identifiers
const perCaller = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(5, '60 s'),
  prefix: 'rl:caller',
})

// Per-action global limiter: backstop against distributed attacks
const perAction = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(1000, '60 s'),
  prefix: 'rl:action',
})

type RateLimitResult = { success: boolean; remaining: number }

export const ratelimit = {
  /**
   * Run both per-caller and per-action checks. Both must succeed.
   * The Promise return type catches missing-await bugs at compile time.
   */
  check: async (
    action: string,
    callerId: string
  ): Promise<RateLimitResult> => {
    const [caller, global] = await Promise.all([
      perCaller.limit(`${action}:${callerId}`),
      perAction.limit(action),
    ])
    return {
      success: caller.success && global.success,
      remaining: Math.min(caller.remaining, global.remaining),
    }
  },
}

The deployment-surface-aware IP helper from earlier:

// lib/get-client-ip.ts
import 'server-only'
import { headers } from 'next/headers'

export async function getClientIp(): Promise<string> {
  const h = await headers()
  const cf = h.get('cf-connecting-ip')
  if (cf) return cf
  if (process.env.VERCEL) {
    const xff = h.get('x-forwarded-for')
    if (xff) return xff.split(',')[0].trim()
  }
  return h.get('x-forwarded-for')?.split(',')[0].trim() ?? '0.0.0.0'
}

The login Server Action with all five failure modes addressed:

// actions/auth.ts
'use server'

import { ratelimit } from '@/lib/rate-limit'
import { getClientIp } from '@/lib/get-client-ip'
import { loginSchema, type LoginInput } from '@/lib/schemas/auth'
import { createServerClientWithCookies } from '@/lib/supabase/server'
import { redirect } from 'next/navigation'

export async function login(data: LoginInput, redirectTo?: string) {
  const ip = await getClientIp()
  const { success } = await ratelimit.check('login', ip)
  if (!success) {
    return { error: 'Too many attempts. Please try again later.' }
  }

  const parsed = loginSchema.safeParse(data)
  if (!parsed.success) {
    return { error: parsed.error.errors[0].message }
  }

  const supabase = await createServerClientWithCookies()
  const { error } = await supabase.auth.signInWithPassword({
    email: parsed.data.email,
    password: parsed.data.password,
  })
  if (error) return { error: error.message }

  const next =
    redirectTo?.startsWith('/') && !redirectTo.startsWith('//')
      ? redirectTo
      : '/dashboard'
  redirect(next)
}

Three things changed from the main guide's version. The IP comes from the deployment-aware helper, not raw header parsing. The rate-limit check runs both per-caller and per-action ceilings via the check wrapper. The Promise<RateLimitResult> return type on check means a missing await errors at compile time, not silently in production.

The same ratelimit.check('signup', ip) extends to signup, password reset, and contact form. For authenticated actions, swap ip for user.id:

const user = await getUser()
if (!user) redirect('/login')
const { success } = await ratelimit.check('checkout', user.id)

The prefix: 'rl:caller' and prefix: 'rl:action' namespacing keeps the two limiters in separate Redis key spaces, which makes the Upstash dashboard's analytics view (analytics: true in the constructor; add it if you want the graphs) actually readable per concern.

What does SecureStartKit ship today, and what should you add?

SecureStartKit ships an in-memory rate limiter with global keys ('login', 'signup', 'reset') as the development-grade baseline. The file's own comment names this: "Resets on server restart, suitable for development and light production use." The five failure modes in this post are the production path layered on top, not the starting point.

The shipped lib/rate-limit.ts is a Map-based counter, exactly the in-memory implementation the rate limiting guide walks through and then explicitly upgrades from. The auth Server Actions call rateLimit('login', 5, 60) with a flat global key, which is the "Shared Keys vs Caller Keys" failure the guide names. Both choices are deliberate defaults that work for development and for low-traffic production. They're not the production-hardened state, and the file's header comment is transparent about that.

The honest assessment per failure mode against the shipped baseline:

Failure mode	Affects shipped baseline?	Notes
IP extraction breaks off Vercel	No (no IP extraction yet)	Surfaces on upgrade to per-caller keys
Fixed-window burst-at-boundary	Yes	The `Map` counter resets at fixed windows; same boundary attack
Per-IP defeated by distributed scrapers	No (global key)	Surfaces on upgrade to per-IP
Missing `await` on Upstash	No (in-memory, synchronous semantics)	Surfaces on Upstash migration
Upstash command billing	No (in-memory has no per-call billing)	Surfaces on Upstash migration

Three of five failures surface only on the upgrade path. That makes the upgrade itself the riskiest single change, because all three appear simultaneously and silently. The recommendation: when you upgrade from the in-memory baseline to Upstash, ship the deployment-aware IP extractor, the sliding-window algorithm, both per-caller and per-action limits, and the typed check wrapper in one PR. Skipping any of them leaves a known failure mode in production from the moment of cutover.

What the template ships that you don't have to add: rate-limit-gated auth Server Actions (login, signup, password reset), Zod validation running after the rate-limit check (right ordering: reject abuse before spending compute on parsing), and the architectural pattern of backend-only data access that keeps the rate limiter in the part of the stack where it can actually enforce.

What to add on the production push:

The lib/get-client-ip.ts helper above.
The Upstash migration with the typed check wrapper.
ESLint's @typescript-eslint/no-floating-promises rule, type-aware.
A weekly five-minute check on the Upstash command-per-day graph. If a single endpoint dominates, decide whether the limiter belongs there or whether the bot-protection layer should catch it earlier.

For the broader hardening pass that pairs with rate limiting, the Next.js security hardening checklist covers the surrounding 11 controls, and the pre-launch security audit is the verification gate that catches the cases where one of the five failure modes slipped through. For the upstream layer that classifies clients before they reach the limiter, the bot protection and DDoS mitigation guide covers the BotID, WAF, and Cloudflare decisions per route.

SecureStartKit ships the secure default and documents the production upgrade path explicitly, including the five failure modes that show up when traffic warrants the move from in-memory to Upstash. The template is the floor; this post is the ceiling.

TL;DR:

x-forwarded-for parsing is deployment-surface-specific. Vercel overwrites the header to prevent spoofing [1]; self-hosted Next.js (Docker, Kubernetes, bare Node behind nginx) trusts whatever the client sends unless you configure a proxy chain. The same .split(',')[0] code that's safe on Vercel hands attackers a controlled identifier on self-host.
Fixed-window limiters let through 2x the limit across the window boundary. Upstash's own algorithm docs name this verbatim: "Can cause high bursts at the window boundaries to leak through" [2]. The sliding-window upgrade costs ~67% more Redis commands per call (5 vs 3) [3]; token bucket sits between them at 4. Pick deliberately, not by default.
Per-IP rate limits are nearly free for distributed scrapers to defeat. Residential proxy networks rent millions of unique IPs by the day; IPv6 /64 allocations give one attacker effectively unlimited source addresses. A per-IP limit at 5/min becomes meaningless when each request comes from a different IP. The bot protection layer classifies clients before they reach the rate limiter.
ratelimit.limit() without await returns a Promise that nobody waits on. The check never runs to completion before the Server Action proceeds. TypeScript flags this only with noFloatingPromises enabled in ESLint; without it, the code compiles, runs, and silently lets everything through.
The rate limiter itself bills per call. Sliding window: 5 Upstash commands per fresh call. Fixed window: 3. Token bucket: 4 [3]. A public endpoint at 100 requests per second on sliding window means 500 Redis commands per second. The ephemeral cache short-circuits to 0 commands for already-blocked identifiers but never covers fresh ones.

What's actually wrong with a typical Server Action rate limiter?
Why does x-forwarded-for parsing break when you leave Vercel?
How does a fixed-window limiter let through 2x the limit in one second?
Why don't per-IP limits stop a distributed scraper?
What happens when you forget to await ratelimit.limit()?
How much does the rate limiter itself cost on a public endpoint?
What does a hardened Server Action rate limiter look like end-to-end?
What does SecureStartKit ship today, and what should you add?

What's actually wrong with a typical Server Action rate limiter?

Here's the shape most production limiters take after a teammate reads that guide and ships the upgrade:

// lib/rate-limit.ts
import { Ratelimit } from '@upstash/ratelimit'
import { Redis } from '@upstash/redis'

export const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.fixedWindow(5, '60 s'),
  prefix: 'rl',
})

// actions/auth.ts
import { headers } from 'next/headers'
import { ratelimit } from '@/lib/rate-limit'

export async function login(data: LoginInput) {
  const headerList = await headers()
  const ip = (headerList.get('x-forwarded-for') ?? '127.0.0.1').split(',')[0]
  const { success } = await ratelimit.limit(`login:${ip}`)
  if (!success) return { error: 'Too many attempts.' }
  // ...
}

This code is in the Next.js docs and the Upstash quickstart. It works on Vercel during local testing with a single browser. Five things go wrong with it in production:

The (headerList.get('x-forwarded-for') ?? '127.0.0.1').split(',')[0] line assumes Vercel's deployment surface. On self-hosted Next.js the same line trusts client-controlled bytes.
Ratelimit.fixedWindow(5, '60 s') allows 10 requests in a 2-second window if the attacker times the boundary. Upstash documents the behavior; the docs example uses sliding window for this exact reason.
The per-IP key is useless against a scraper that rotates through 500 IPs. Each request gets its own bucket of 5, so the effective limit is 5 * 500 = 2500 requests per minute.
await is doing real work in this code, and removing it accidentally is a one-character mistake that turns the limiter into a no-op.
The Upstash call costs 3 Redis commands per fresh hit on fixed window, 5 on sliding window. On a high-traffic public endpoint, the rate limiter becomes a meaningful line item on the Upstash invoice.

The next sections cover each failure in depth with the primary-source evidence and the concrete fix.

Why does x-forwarded-for parsing break when you leave Vercel?

The fix depends on your deployment surface, not your code:

On Vercel directly: headerList.get('x-forwarded-for') is trustworthy. The first comma-separated element is the client IP. The current pattern is correct.
On Vercel behind Cloudflare or another proxy: trust the right hop. cf-connecting-ip is Cloudflare's authoritative client-IP header; prefer it over x-forwarded-for when both are present. Vercel's docs flag this case: "Enterprise customers can purchase and enable a trusted proxy to allow your custom X-Forwarded-For IP" [1]. For non-Enterprise, the safer pattern is to skip XFF entirely and use the Cloudflare header.
On self-hosted Next.js (Docker, Kubernetes, bare Node behind nginx): configure the proxy to strip incoming X-Forwarded-For and write only the connection IP. In nginx: proxy_set_header X-Forwarded-For $remote_addr; (the $proxy_add_x_forwarded_for variant appends and trusts). Then your Next.js code can trust the header again, because the proxy guaranteed it.

The deployment-surface-aware extractor:

// lib/get-client-ip.ts
import { headers } from 'next/headers'

export async function getClientIp(): Promise<string> {
  const h = await headers()

  // Cloudflare in front: trust cf-connecting-ip, ignore x-forwarded-for
  const cf = h.get('cf-connecting-ip')
  if (cf) return cf

  // Vercel direct: x-forwarded-for is overwritten, first element is client
  if (process.env.VERCEL) {
    const xff = h.get('x-forwarded-for')
    if (xff) return xff.split(',')[0].trim()
  }

  // Self-hosted: only trust XFF if the proxy stripped incoming values
  // Otherwise fall through to a fixed identifier that breaks per-IP limits
  // loudly rather than silently
  return h.get('x-forwarded-for')?.split(',')[0].trim() ?? '0.0.0.0'
}

How does a fixed-window limiter let through 2x the limit in one second?

The cost difference is concrete. Upstash's costs documentation publishes the per-call Redis-command counts [3]:

Algorithm	First call	Intermediate hit/miss	Rate-limited (no cache)	Rate-limited (cache hit)
Fixed window	3 (EVAL, INCR, PEXPIRE)	2 (EVAL, INCR)	2 (EVAL, INCR)	0 (cached)
Sliding window	5 (EVAL, GET, GET, INCR, PEXPIRE)	4 (EVAL, GET, GET, INCR)	3 (EVAL, GET, GET)	0 (cached)
Token bucket	4 (EVAL, HMGET, HSET, PEXPIRE)	4 (EVAL, HMGET, HSET, PEXPIRE)	2 (EVAL, HMGET)	0 (cached)

The configuration change is one line:

// before
limiter: Ratelimit.fixedWindow(5, '60 s'),

// after, for auth
limiter: Ratelimit.slidingWindow(5, '60 s'),

// after, for bursty user actions
limiter: Ratelimit.tokenBucket(10, '60 s', 20), // refill 10/min, capacity 20

The change is invisible to your action code; the ratelimit.limit() API is the same. The semantics are different and the bill is different.

Why don't per-IP limits stop a distributed scraper?

const perIpLimit = await ratelimit.limit(`login:${ip}`)
const globalLimit = await globalRatelimit.limit('login:global')
if (!perIpLimit.success || !globalLimit.success) {
  return { error: 'Too many attempts. Please try again later.' }
}

What happens when you forget to await ratelimit.limit()?

The code that produces the bug:

// broken: no await, no error
const { success } = ratelimit.limit(`login:${ip}`) // returns Promise, not Result
if (!success) return { error: 'Too many attempts.' }
// success is undefined; !undefined is true... but actually !success is true,
// so this *blocks all requests* if you destructure from the Promise object,
// OR allows all if your code uses ?.success which is undefined.

The Upstash documentation only shows the correct pattern with await:

// correct: await yields the resolved Result
const { success } = await ratelimit.limit('api')
if (!success) return { error: 'Too many attempts.' }

TypeScript's noFloatingPromises ESLint rule catches it at lint time. Without the rule enabled, the missing await compiles cleanly and ships. Add the rule to .eslintrc:

{
  "parser": "@typescript-eslint/parser",
  "parserOptions": { "project": "./tsconfig.json" },
  "plugins": ["@typescript-eslint"],
  "rules": {
    "@typescript-eslint/no-floating-promises": "error"
  }
}

The runtime fallback, for codebases not ready to adopt the lint rule yet: type-narrow the return at the import site so destructuring the Promise raises a TypeScript error.

// lib/rate-limit.ts
import { Ratelimit } from '@upstash/ratelimit'
import { Redis } from '@upstash/redis'

const upstashRatelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(5, '60 s'),
  prefix: 'rl',
})

// Re-export with an explicit Promise-returning signature
export const ratelimit = {
  limit: (id: string): Promise<{ success: boolean; remaining: number }> =>
    upstashRatelimit.limit(id),
}

How much does the rate limiter itself cost on a public endpoint?

The per-call cost table from the Upstash docs, repeated for emphasis [3]:

Fixed window: 3 commands first call, 2 commands intermediate
Sliding window: 5 commands first call, 4 commands intermediate
Token bucket: 4 commands per call (all variants)
Rate-limited with cache hit: 0 commands (any algorithm)

Two practical consequences:

The cost-attribution view, applied to common patterns:

Endpoint pattern	RPS	Algo	Commands/s	Commands/day
`/login` form	0.1	sliding	0.5	43,200
`/contact` form	0.05	sliding	0.25	21,600
`/api/og` (public OG image)	50	sliding	250	21,600,000
`/blog/*` (per-IP rate limit)	100	sliding	500	43,200,000

The mitigation menu:

Drop the rate limit on routes that don't need it. A static blog post doesn't need per-IP rate limiting; the bot protection layer at the WAF layer is the right tool for content scraping defense. The Vercel WAF rule with the maintained UA catalog ships the actual configuration with per-route patterns for /blog/* (allow search crawlers, deny training) and /tools/* (deny both).
Use fixed window where the burst-at-boundary risk is acceptable. Internal admin actions or low-stakes flows can save 40% of commands per call.
Coarse identifiers for cheap routes. Per-ASN or per-country rate limiting for public assets costs the same per call as per-IP but creates fewer unique buckets, increasing cache hit rates.
Eat the cost as the price of the control. For auth actions, the commands-per-day is small enough that the answer is "stop optimizing the rate limiter and ship the feature."

What does a hardened Server Action rate limiter look like end-to-end?

The complete lib/rate-limit.ts:

// lib/rate-limit.ts
import 'server-only'
import { Ratelimit } from '@upstash/ratelimit'
import { Redis } from '@upstash/redis'

const redis = Redis.fromEnv()

// Per-caller limiter: tracks individual identifiers
const perCaller = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(5, '60 s'),
  prefix: 'rl:caller',
})

// Per-action global limiter: backstop against distributed attacks
const perAction = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(1000, '60 s'),
  prefix: 'rl:action',
})

type RateLimitResult = { success: boolean; remaining: number }

export const ratelimit = {
  /**
   * Run both per-caller and per-action checks. Both must succeed.
   * The Promise return type catches missing-await bugs at compile time.
   */
  check: async (
    action: string,
    callerId: string
  ): Promise<RateLimitResult> => {
    const [caller, global] = await Promise.all([
      perCaller.limit(`${action}:${callerId}`),
      perAction.limit(action),
    ])
    return {
      success: caller.success && global.success,
      remaining: Math.min(caller.remaining, global.remaining),
    }
  },
}

The deployment-surface-aware IP helper from earlier:

// lib/get-client-ip.ts
import 'server-only'
import { headers } from 'next/headers'

export async function getClientIp(): Promise<string> {
  const h = await headers()
  const cf = h.get('cf-connecting-ip')
  if (cf) return cf
  if (process.env.VERCEL) {
    const xff = h.get('x-forwarded-for')
    if (xff) return xff.split(',')[0].trim()
  }
  return h.get('x-forwarded-for')?.split(',')[0].trim() ?? '0.0.0.0'
}

The login Server Action with all five failure modes addressed:

// actions/auth.ts
'use server'

import { ratelimit } from '@/lib/rate-limit'
import { getClientIp } from '@/lib/get-client-ip'
import { loginSchema, type LoginInput } from '@/lib/schemas/auth'
import { createServerClientWithCookies } from '@/lib/supabase/server'
import { redirect } from 'next/navigation'

export async function login(data: LoginInput, redirectTo?: string) {
  const ip = await getClientIp()
  const { success } = await ratelimit.check('login', ip)
  if (!success) {
    return { error: 'Too many attempts. Please try again later.' }
  }

  const parsed = loginSchema.safeParse(data)
  if (!parsed.success) {
    return { error: parsed.error.errors[0].message }
  }

  const supabase = await createServerClientWithCookies()
  const { error } = await supabase.auth.signInWithPassword({
    email: parsed.data.email,
    password: parsed.data.password,
  })
  if (error) return { error: error.message }

  const next =
    redirectTo?.startsWith('/') && !redirectTo.startsWith('//')
      ? redirectTo
      : '/dashboard'
  redirect(next)
}

The same ratelimit.check('signup', ip) extends to signup, password reset, and contact form. For authenticated actions, swap ip for user.id:

const user = await getUser()
if (!user) redirect('/login')
const { success } = await ratelimit.check('checkout', user.id)

What does SecureStartKit ship today, and what should you add?

The honest assessment per failure mode against the shipped baseline:

Failure mode	Affects shipped baseline?	Notes
IP extraction breaks off Vercel	No (no IP extraction yet)	Surfaces on upgrade to per-caller keys
Fixed-window burst-at-boundary	Yes	The `Map` counter resets at fixed windows; same boundary attack
Per-IP defeated by distributed scrapers	No (global key)	Surfaces on upgrade to per-IP
Missing `await` on Upstash	No (in-memory, synchronous semantics)	Surfaces on Upstash migration
Upstash command billing	No (in-memory has no per-call billing)	Surfaces on Upstash migration

What to add on the production push:

The lib/get-client-ip.ts helper above.
The Upstash migration with the typed check wrapper.
ESLint's @typescript-eslint/no-floating-promises rule, type-aware.
A weekly five-minute check on the Upstash command-per-day graph. If a single endpoint dominates, decide whether the limiter belongs there or whether the bot-protection layer should catch it earlier.

5 Production Rate-Limit Failure Modes in Next.js [2026]

Table of Contents

What's actually wrong with a typical Server Action rate limiter?

Why does x-forwarded-for parsing break when you leave Vercel?

How does a fixed-window limiter let through 2x the limit in one second?

Why don't per-IP limits stop a distributed scraper?

What happens when you forget to await ratelimit.limit()?

How much does the rate limiter itself cost on a public endpoint?

What does a hardened Server Action rate limiter look like end-to-end?

What does SecureStartKit ship today, and what should you add?

References

Related Posts

Next.js Secrets: 4 Ways to Share Them Safely [2026]

Next.js Errors That Fail Open: The OWASP A10 Fix [2026]

Patching Next.js Framework CVEs: 5 Failure Modes [2026]

5 Production Rate-Limit Failure Modes in Next.js [2026]

Table of Contents

What's actually wrong with a typical Server Action rate limiter?

Why does x-forwarded-for parsing break when you leave Vercel?

How does a fixed-window limiter let through 2x the limit in one second?

Why don't per-IP limits stop a distributed scraper?

What happens when you forget to await ratelimit.limit()?

How much does the rate limiter itself cost on a public endpoint?

What does a hardened Server Action rate limiter look like end-to-end?

What does SecureStartKit ship today, and what should you add?

References

Related Posts

Next.js Secrets: 4 Ways to Share Them Safely [2026]

Next.js Errors That Fail Open: The OWASP A10 Fix [2026]

Patching Next.js Framework CVEs: 5 Failure Modes [2026]