What storage backend should I use for idempotency keys?

Use Redis for sub-millisecond ingress validation and PostgreSQL (or equivalent RDBMS) as the authoritative durable store. A tiered approach — Redis in front, relational DB behind — combines latency and durability guarantees.

How long should idempotency keys be retained?

TTL windows must exceed the maximum client retry ceiling: 72 hours is a common baseline for standard REST APIs; financial reconciliation systems typically require 7–30 days.

What isolation level prevents duplicate processing under concurrent retries?

SERIALIZABLE or REPEATABLE READ isolation prevents phantom reads during concurrent key lookups. For high-concurrency workloads, combine REPEATABLE READ with a unique index constraint so the database rejects the second write at the storage layer.

Backend Storage Patterns for Idempotency

Engineering Contract

This page owns one guarantee: a state-mutating request that carries an idempotency key executes its side effects exactly once, regardless of how many times the request envelope arrives. That contract matters in any system with at-least-once retry semantics—client SDKs, API gateways, and service meshes all retry on transient failure, which means every external call must be treated as potentially duplicated. The precise failure mode this prevents is the duplicate side effect: a double charge, a second inventory reservation, a conflicting resource allocation—events that corrupt application state without any network-level signal that anything went wrong.

Achieving exactly-once side effects requires choosing the right storage layer, enforcing atomic key registration, and managing key lifecycle. The sections below map each concern to a concrete implementation pattern.

Conceptual Architecture: End-to-End Request Flow

A well-structured idempotency implementation operates as a two-layer state machine. The outer layer is a low-latency cache responsible for short-circuiting duplicate requests before they consume downstream compute. The inner layer is the durable database, which acts as the authoritative arbiter when the cache cannot be trusted.

The canonical flow for a first-time request proceeds as follows:

The client sends a POST request with an Idempotency-Key header whose value is a deterministic identifier—typically a UUIDv4 or an HMAC-deterministic key.
The API gateway or service handler checks the Redis cache using an atomic SET key PENDING NX EX <ttl>. A cache miss permits the request to proceed; a cache hit returns the stored response immediately.
The service opens a database transaction, performs a second key check against the durable idempotency table, executes business logic, inserts the key record alongside the state mutation, and commits atomically.
The final response is written back to the cache with the full TTL, and returned to the client.

On retry, the cache hit at step 2 terminates the flow without touching the database. If the cache has been evicted or a split-brain occurred, the database unique constraint at step 3 prevents duplicate commits.

This layered design means that neither the cache nor the database alone is sufficient. The cache can be evicted; the database can be slow. Together they provide both the latency profile and the durability guarantee the contract requires.

Failure Boundary Map

Every layer in the request path is a potential duplicate-creation point. The following map identifies where idempotency guarantees must be enforced at each architectural boundary and the common partial-failure scenarios at each.

Layer	Failure Scenario	Required Enforcement
Load Balancer	Timeout retries route a duplicate request to a different upstream instance before the first completes	Session affinity per idempotency key or a shared distributed cache all instances read
API Gateway	Gateway-level retry policies re-issue requests after a 5xx without an `Idempotency-Key` passthrough header	Enforce header passthrough in gateway config; reject requests missing the key with `400 Bad Request`
Service Mesh	Sidecar proxies (Envoy, Linkerd) apply automatic retries on stream resets, bypassing application-level deduplication	Set `x-envoy-retry-on` only for idempotent HTTP methods; treat `POST`/`PATCH` as non-retryable at the mesh layer
Application Service	Request handler crashes after DB commit but before caching the response; next retry finds no cache entry, re-enters transaction, hits unique constraint	Treat unique constraint violations as “already processed”—read and return the committed response
Database	Deadlock or lock timeout rolls back the transaction after key insertion; client retries succeed but the first attempt’s key record is gone	Use `INSERT ... ON CONFLICT DO NOTHING` and validate commit success before writing to cache
Message Broker	At-least-once delivery in Kafka/RabbitMQ redelivers a message after consumer restart; consumer group offset commit fails	Embed idempotency key in message envelope; check durable store before executing consumer business logic

Understanding which layer can produce duplicates determines where each enforcement mechanism must sit. Idempotency checks cannot live only at the application layer if the load balancer or mesh is capable of retrying independently.

Implementation Patterns

Four storage-level patterns cover the full idempotency implementation surface. The table below links to the dedicated implementation pages and summarises the variant each one covers.

Pattern	Storage Layer	Guarantee	When to Use
Redis Cache-Based Deduplication	In-memory key-value store	Best-effort; loses guarantees on eviction or node failure	High-throughput ingress filter; latency budget < 5 ms
Database Unique Constraints & Upserts	Relational DB (PostgreSQL, MySQL)	Durable; survives cache eviction and process restarts	Authoritative deduplication for financial or audit-critical paths
Transaction Scoping & Atomic Operations	Relational DB + application logic	Serializable; prevents phantom reads and duplicate commits under concurrency	Any endpoint where the key check and the state mutation must be indivisible
Idempotency Key Storage & TTL Management	Redis TTL + DB time-series partitioning	Operational lifecycle; not a safety guarantee	Key expiry, audit log retention, compliance windows

These patterns are complementary, not alternatives. A production deployment combines all four: Redis for ingress speed, a unique constraint for durability, a transaction scope for concurrency safety, and a TTL strategy for lifecycle management.

Trade-off Matrix

Selecting a primary idempotency store requires explicit trade-offs across latency, durability, consistency, and operational complexity.

Storage Backend	p99 Lookup Latency	Durability	Consistency Under Concurrency	Operational Overhead
Redis (single node)	< 1 ms	Low — data lost on restart without AOF/RDB	Strong for single key via `SET NX`; no cross-key transactions	Low — no schema, native TTL
Redis Cluster	1–3 ms	Medium — replication lag creates brief split-brain windows	EVALSHA/Lua scripts enforce atomicity within a slot	Medium — slot routing, node failover runbooks
PostgreSQL (same-region)	5–20 ms	High — WAL-backed, ACID	`SERIALIZABLE` isolation prevents all anomalies	Medium — index maintenance, vacuum tuning
PostgreSQL (multi-region read replica)	20–80 ms	High	Replication lag means replicas may not see recently committed keys	High — lag monitoring, replica promotion runbooks
DynamoDB (conditional writes)	5–15 ms	High — multi-AZ by default	Conditional writes (`attribute_not_exists`) are linearizable per item	Low operationally, but higher cost at scale
CockroachDB / Spanner	20–100 ms	Very high — global consensus	Serializable globally; clock skew handled by TrueTime/HLC	High — distributed SQL expertise required

The most common production configuration is Redis-first for the 95th-percentile hit rate, with PostgreSQL as the authoritative fallback. This combination achieves sub-5 ms response times for retries while providing full durability for first writes and unique constraint protection against concurrent duplicates.

DynamoDB conditional writes are the correct choice when the primary data store is already DynamoDB and operational simplicity outweighs cost. Spanner-class databases are justified only when global linearizability is a hard requirement—typically in cross-border payment settlement systems.

Anti-Patterns and Pitfalls

1. Key Validation After State Mutation

The most destructive ordering mistake is checking the idempotency key after executing business logic. If a charge executes before the key is registered, a concurrent retry can pass the key check (key not yet inserted), execute the charge again, and only then lose the race to the unique constraint—resulting in a double charge with a constraint error on the second write. The check-then-mutate-then-register sequence must never appear in production code.

Correct ordering: check key → execute business logic → register key + commit, all within a single transaction.

2. Stale Cache Replay

When a cached response is stored with a TTL shorter than the maximum client retry window, the key expires while a legitimate retry is still in flight. The retry bypasses the cache, re-enters the transaction, and finds the key in the database—correct. But if the cache TTL is also shorter than the DB TTL, an intermediate state exists where the cache says “not found” and the DB says “already processed”. Services that trust only the cache and skip the DB check will reprocess the request.

Always set the Redis TTL equal to or greater than the database TTL. Never trust the cache as the sole authority.

3. Over-Engineering Idempotency on Safe Endpoints

GET, HEAD, and OPTIONS requests are side-effect-free by definition. Implementing idempotency key infrastructure for read-only endpoints wastes cache memory, adds lookup latency, and creates unnecessary operational surface area. Reserve idempotency enforcement for POST, PUT, PATCH, and DELETE operations that mutate state.

The HTTP method semantics and safety page covers which verbs require which guarantees.

4. Ignoring Clock Skew in TTL Calculations

TTL windows expressed as wall-clock durations (e.g., “72 hours from now”) are computed at key insertion time. In multi-region deployments with NTP drift or leap second events, the insertion timestamp on one node may differ from another by seconds or minutes. For keys with very short TTLs (under 60 seconds), clock skew can cause premature expiry in one region while the key remains valid in another, creating asymmetric deduplication windows.

Use monotonic TTL offsets in seconds rather than absolute expiry timestamps. In Redis, SET key value NX EX 259200 (72 hours in seconds) is correct; computing EXPIREAT key <unix_timestamp> from a wall clock is fragile.

5. Missing Observability on Key Collisions

A key collision—where the database unique constraint fires on a retry—is a success event, not an error. Services that surface constraint violations as 500-series errors, log them as failures, or trigger alerts on them will generate noise that drowns out genuine errors. Key collisions must be caught explicitly, translated to a “already processed” branch, and counted as a separate metric (idempotency.key_collision.count) so that duplicate request rates can be monitored independently of error rates.

Production Readiness Checklist

The following numbered checklist mirrors the sequence an SRE would follow when validating an idempotency implementation before production traffic.

Enforce Idempotency-Key header at the API gateway boundary. Reject POST/PUT/PATCH/DELETE requests missing the header with 400 Bad Request before they reach application logic.
Validate key format on ingress. Accept only UUIDv4, UUIDv7, or HMAC-deterministic keys with a minimum of 128 bits of entropy. Reject malformed keys with a descriptive 400 error body.
Bind key check, business logic, and key registration in a single database transaction. No state mutation outside this atomic boundary.
Use INSERT ... ON CONFLICT DO NOTHING (PostgreSQL) or conditional write (DynamoDB) as the uniqueness constraint. Never rely solely on application-level SELECT-then-INSERT.
Set Redis TTL to 72 hours minimum (259200 seconds) for standard APIs; 2592000 seconds (30 days) for financial reconciliation paths. Align the database retention window to match.
Treat unique constraint violations as success branches. Read the committed response and return it with 200 OK. Emit idempotency.key_collision.count as a distinct metric.
Implement exponential backoff with jitter in all client SDKs. Document maximum retry ceilings to validate TTL window sizing.
Emit structured audit log entries on every key registration and collision. Include: idempotency key, request fingerprint hash, outcome (NEW / DUPLICATE), response status, and processing latency.
Add SRE alert thresholds: duplicate request rate > 5% of total requests, cache hit ratio < 80% for idempotency lookups, DB constraint violation rate diverging from duplicate request rate (signals cache bypass).
Run chaos testing before promotion to production. Simulate network timeouts mid-transaction, concurrent duplicate payloads, cache node failures, and database failovers. Verify that exactly-once side effects hold across all scenarios.

Redis Cache-Based Deduplication — atomic SET NX patterns, Lua script pipelines, and Redis Cluster slot routing for high-throughput ingress deduplication
Database Unique Constraints & Upserts — PostgreSQL INSERT ... ON CONFLICT, index design, and multi-tenant key namespacing
Transaction Scoping & Atomic Operations — isolation level selection, deadlock prevention, and distributed lock patterns for concurrent retry storms
Idempotency Key Storage & TTL Management — TTL sizing formulas, partitioned table expiry strategies, and compliance retention windows
Idempotency Fundamentals & API Guarantees — the foundational guarantee model, Idempotency-Key header semantics, and retry contract definitions that underpin all storage-layer implementations
Distributed Coordination & Locking Strategies — Redlock, ZooKeeper-based fencing tokens, and consensus protocols for cross-region key coordination
Observability & Operations for Idempotent Systems — hit-rate dashboards, OpenTelemetry tracing, and chaos testing to validate the storage layer under duplicate load