What TTL should I use for a distributed lock in a payment API?

Use 3–5 s for synchronous payment calls and 30 s for saga orchestration steps. Set the renewal trigger at 50 % of the TTL so a single renewal attempt still completes before expiration.

How does a fencing token prevent stale lock holders from corrupting state?

A fencing token is a monotonically increasing integer issued with each successful lock grant. The resource server rejects writes tagged with a token lower than the last accepted value, blocking stale holders even after clock drift invalidates TTL-based reasoning.

When should I use Redlock instead of a single Redis instance?

Use Redlock when a single Redis instance is a single point of failure and your deduplication guarantee must survive the loss of one or two coordination nodes. Redlock requires N=5 independent instances and quorum (3/5) before granting the lock.

Distributed Lock Acquisition Patterns

When multiple service instances race to process the same payload — a payment webhook delivered twice, a mobile client retrying after a dropped connection, a message broker redelivering after an ACK timeout — deterministic lock acquisition is the mechanism that collapses concurrent attempts into a single authoritative execution. This page covers production-grade acquisition patterns, lease lifecycle management, fencing token semantics, and failure recovery strategies. It is part of Distributed Coordination & Locking Strategies, the authoritative reference for coordination-layer deduplication.

Problem Framing

The core failure mode is the lost-update race: two worker processes each read “no lock exists”, both proceed with processing, and both write a result. In payment systems, this is a double charge. In inventory systems, it is an oversell. In saga orchestration, it is a duplicate state transition that corrupts a compensation chain.

Naive solutions — application-level flags, non-atomic read-check-write sequences, advisory locks without TTL — fail under network partitions and process crashes. The failure surface expands with every additional AZ, replica, and asynchronous retry path. Lock acquisition must therefore be:

Atomic: a single coordination operation, not a read followed by a write.
Bounded: a TTL that caps orphan lifetime without requiring explicit release.
Fenced: a monotonic token that lets resource servers reject stale holders.
Verified: ownership confirmed at every renewal and release, never assumed.

Guarantee Model

Distributed lock acquisition provides mutual exclusion with a bounded validity window: at most one holder possesses the lock at any instant within a single coordination partition. The precise contract depends on the underlying store:

Coordination layer	Consistency class	Partition behaviour	Typical P99 acquisition
Redis single instance	Linearizable per-node	Lock unavailable during failover	< 1 ms
Redlock (5 nodes)	Probabilistic mutual exclusion	Lock unavailable if ≥ 3 nodes unreachable	2–8 ms
etcd (Raft)	Linearizable cluster-wide	Lock unavailable during leader election	5–15 ms
ZooKeeper (ZAB)	Sequential consistency	Lock unavailable during ZAB recovery	4–12 ms
PostgreSQL advisory lock	Serializable per-session	Lock unavailable during primary failover	1–3 ms

Where the guarantee breaks:

Clock skew: TTL-based expiration uses wall-clock time. A VM that is paused (hypervisor or GC stop-the-world) for longer than the TTL will hold a lock its coordination node already considers expired. The resource server then races with a new holder.
Network partition with asymmetric visibility: A Redis primary may acknowledge a SET NX that is never replicated before a failover. The new primary issues the same lock to a second holder.
Split-brain in consensus clusters: An etcd cluster that loses quorum blocks all writes, turning the lock service into an availability bottleneck rather than a safety mechanism.

Fencing tokens (described below) are the only correct response to clock-skew and split-brain scenarios. Relying solely on TTL expiration is insufficient for financial-grade deduplication.

Lock Acquisition Lifecycle

Lock acquisition lifecycle: atomic SET NX issues a fencing token; a background timer renews at 50 % of TTL; the resource server validates the token before accepting writes; Lua-scripted CAS releases only if the caller still holds the token.

Core Algorithm: Atomic Acquisition and Fencing

Step-by-Step Protocol

1. Derive a deterministic lock key.

Canonicalize the request body by sorting JSON keys, stripping volatile fields (timestamp, trace_id, request_id), and hashing with SHA-256. Namespace the key to prevent cross-tenant collisions:

{service}:{env}:{tenant_id}:{resource_type}:sha256:{hex_digest}

Example: payments:prod:acme:invoice:sha256:a1b2c3d4e5f6...

2. Acquire atomically with a fencing token.

Redis implementation using a Lua script to return a monotonically increasing token atomically:

-- acquire.lua
local key    = KEYS[1]
local token  = ARGV[1]   -- UUID or monotonic counter
local ttl_ms = tonumber(ARGV[2])

if redis.call("SET", key, token, "NX", "PX", ttl_ms) then
  -- Increment a shared counter to produce the fencing token
  local fence = redis.call("INCR", "fence:" .. key)
  return fence
else
  return nil
end

If nil is returned, the lock is held by another process. The caller should apply exponential backoff with jitter before retrying — base delay 50 ms, 2× multiplier, ±20 % jitter, maximum 5 attempts.

3. Renew the lease before expiration.

Renewal must verify ownership before extending — a process that lost the lock must not silently extend a lock now held by a different holder:

-- renew.lua
local key    = KEYS[1]
local token  = ARGV[1]
local ttl_ms = tonumber(ARGV[2])

if redis.call("GET", key) == token then
  return redis.call("PEXPIRE", key, ttl_ms)
else
  return 0  -- ownership lost; caller should abort
end

Trigger renewal at 50 % of the remaining TTL. For a 3 000 ms TTL, the first renewal fires at ~1 500 ms. If the renewal returns 0, the process must abort its in-flight operation — continuing risks a duplicate write after the TTL expires and a new holder acquires the lock.

4. Pass the fencing token to downstream writes.

The resource server (database, payment gateway, external API) must persist and check last_seen_fence per lock key. Reject any write tagged with a fencing token ≤ last_seen_fence. This is the only correct defence against a paused process that resumes after its TTL expires.

-- PostgreSQL: conditional write with fencing token
UPDATE payment_requests
SET    status = 'processed', fencing_token = $1
WHERE  idempotency_key = $2
  AND  (fencing_token IS NULL OR fencing_token < $1);

5. Release via Lua CAS.

Delete the lock only if the caller still owns it. A plain DEL would silently delete a lock now held by a different process:

-- release.lua
local key   = KEYS[1]
local token = ARGV[1]

if redis.call("GET", key) == token then
  return redis.call("DEL", key)
else
  return 0
end

Never release the lock before durable persistence of the state change. Releasing early and then failing mid-write produces a window where a second holder acquires the lock and observes inconsistent state.

Implementation Variants

Variant 1: Redis Single-Instance (Linearizable, Single-Node)

Best for: high-throughput endpoints (>50k req/s), single-region deployments, sub-millisecond acquisition budgets.

// Go — atomic acquisition with fencing token
func AcquireLock(ctx context.Context, rdb *redis.Client, key, token string, ttl time.Duration) (int64, error) {
    script := redis.NewScript(`
        if redis.call("SET", KEYS[1], ARGV[1], "NX", "PX", ARGV[2]) then
            return redis.call("INCR", "fence:" .. KEYS[1])
        end
        return nil
    `)
    result, err := script.Run(ctx, rdb, []string{key}, token, ttl.Milliseconds()).Int64()
    if err == redis.Nil {
        return 0, ErrLockHeld
    }
    return result, err
}

Failure boundary: Redis primary failover. During failover (typically 15–30 s for Redis Sentinel), SET NX calls will fail. The caller receives ErrLockHeld (or a connection error); route to a fallback queue. See lock timeout and lease management for TTL alignment guidance.

Variant 2: etcd (Raft-Backed, Cluster-Wide Linearizable)

Best for: multi-region CP deployments, consensus-grade deduplication, saga orchestration steps where a split-brain must never produce a duplicate execution.

// Go — etcd lease with fencing via ModRevision
func AcquireEtcdLock(ctx context.Context, cli *clientv3.Client, key, value string, ttlSec int64) (int64, error) {
    lease, err := cli.Grant(ctx, ttlSec)
    if err != nil {
        return 0, err
    }
    txn := cli.Txn(ctx).
        If(clientv3.Compare(clientv3.CreateRevision(key), "=", 0)).
        Then(clientv3.OpPut(key, value, clientv3.WithLease(lease.ID))).
        Else(clientv3.OpGet(key))
    resp, err := txn.Commit()
    if err != nil {
        return 0, err
    }
    if !resp.Succeeded {
        return 0, ErrLockHeld
    }
    // ModRevision is the natural fencing token in etcd
    return resp.Header.Revision, nil
}

etcd’s Revision field serves as the fencing token natively — it is a monotonically increasing cluster-wide counter that increments on every write. Downstream services compare Revision against a stored high-water mark to reject stale writes.

Operational note: etcd clusters cap at roughly 10k–50k writes/s per cluster. Shard lock namespaces across multiple etcd clusters if a single payment processing service requires more throughput than this ceiling.

Variant 3: PostgreSQL Advisory Locks

Best for: services that already operate within a PostgreSQL transaction boundary; avoids a separate coordination dependency.

# Python (psycopg2) — session-level advisory lock with deterministic key
import hashlib, struct, psycopg2

def pg_lock_key(idempotency_key: str) -> int:
    digest = hashlib.sha256(idempotency_key.encode()).digest()
    # PostgreSQL advisory lock key is a 64-bit signed integer
    return struct.unpack(">q", digest[:8])[0]

def acquire_advisory_lock(conn, idempotency_key: str, timeout_ms: int = 3000) -> bool:
    lock_key = pg_lock_key(idempotency_key)
    with conn.cursor() as cur:
        cur.execute("SET lock_timeout = %s", (f"{timeout_ms}ms",))
        try:
            cur.execute("SELECT pg_try_advisory_lock(%s)", (lock_key,))
            return cur.fetchone()[0]
        except psycopg2.errors.LockNotAvailable:
            return False

Advisory locks are released automatically on connection close, eliminating orphan risk. However, they are session-scoped: a process crash or pool recycling releases the lock immediately, which may be earlier than desired if the downstream write has not yet committed. Always pair with a database-level unique constraint on the idempotency key as a second deduplication layer — see database unique constraints and upserts.

Variant 4: Redlock (Multi-Node Fault Tolerance)

Best for: multi-region deployments where a single Redis instance is unacceptable as a single point of failure, and linearizability can be relaxed to probabilistic mutual exclusion.

Full step-by-step acquisition and validation logic is covered in Implementing Redlock for High-Availability Deduplication. The core algorithm:

Record wall-clock t_start.
Send SET NX PX ttl to all N = 5 independent Redis instances in parallel, with a per-instance timeout of 50 ms.
Count successful acquisitions. If count ≥ 3 and (now - t_start) < ttl - clock_drift_margin, the lock is valid.
Valid lease duration = ttl - (now - t_start) - clock_drift_margin. Reject if this is negative.
Release by calling the Lua CAS script on all 5 instances, regardless of quorum outcome.

Summary comparison:

Variant	Consistency	Acquisition P99	Fault tolerance	Suitable deduplication scope
Redis single	Linearizable (node)	< 1 ms	None (SPOF)	Single-region, non-financial
etcd / Raft	Linearizable (cluster)	5–15 ms	N/2 node failures	Multi-region, saga steps
PostgreSQL advisory	Serializable (session)	1–3 ms	None beyond DB HA	Intra-transaction deduplication
Redlock (N=5)	Probabilistic mutual exclusion	2–8 ms	2 node failures	Multi-region, high-availability

Edge Cases and Failure Scenarios

Failure Scenario	Remediation Steps	Observability Hooks
GC pause exceeds TTL (JVM) — process is paused for >3 s; lock expires; second holder acquires; first resumes and writes	Configure `-XX:MaxGCPauseMillis=200`. Set TTL ≥ 5× worst-case GC pause. Resource server rejects write via fencing token (stale token < `last_seen_fence`). Abort the first holder’s transaction on `0` renewal response.	`jvm.gc.pause.max` metric; `lock.renewal.failed` counter; `fencing_token_rejected` log field
Redis primary failover during acquisition — `SET NX` acknowledges on primary before replication; new primary re-issues the same lock to a second holder	Enable `WAIT 1 0` after `SET NX` to confirm at least one replica acknowledges before trusting the lock. For financial-grade workloads, prefer etcd or PostgreSQL advisory locks during Redis failover windows.	`redis.replication.lag` metric; `lock.acquisition.failover_collision` counter; Redis Sentinel `+switch-master` event
Network partition isolates lock holder — client holds a valid lock but loses connectivity to the coordination store before explicit release	TTL expiration automatically expires the orphaned lock after at most 3–5 s. Fencing token prevents the isolated holder from writing to resource servers that have issued a newer token. Implement circuit-breaker: after 3 consecutive renewal failures, the holder must self-abort.	`lock.renewal.consecutive_failures` counter; circuit-breaker state metric; `lock.orphan.ttl_expired` increment on expiration
Clock skew > TTL on VM host — hypervisor clock correction jumps the wall clock forward by > TTL; lock expires before the holder’s timer fires	Use monotonic clock for local TTL tracking (`time.Since(acquireTime)` in Go, `System.nanoTime()` in JVM). Never calculate remaining TTL from wall-clock subtraction. Deploy NTP with `tinker panic 0` to prevent large corrections from crashing `ntpd`.	`ntp.offset_ms` metric; alert at > 200 ms offset; `monotonic_ttl_remaining` gauge
Thundering herd on lock expiration — dozens of waiters simultaneously retry when a popular lock expires	Apply jitter-based exponential backoff: base 50 ms, 2× multiplier, ±20 % random jitter, 5 retries max. On exhaustion, route to a deferred processing queue rather than returning HTTP 500.	`lock.retry.attempt` histogram; `lock.retry.exhausted` counter; queue depth gauge
Namespace collision across tenants — two tenants share a lock key due to missing tenant scoping	Enforce key schema validation in middleware: reject keys that do not match `{service}:{env}:{tenant_id}:{resource}:sha256:{64-hex-chars}`. Emit `lock.key.schema_violation` and return HTTP 400 before the coordination call.	`lock.key.schema_violation` counter; structured log field `lock_key_raw`; MDM alert on first occurrence

Operational Concerns

TTL Sizing

TTL must satisfy: TTL > worst_case_processing_time + clock_drift_margin + one_renewal_cycle. For synchronous payment calls processing within 500 ms, a 3 000 ms TTL with renewal at 1 500 ms leaves a 1 000 ms safety buffer. For saga orchestration steps that call external payment gateways (P99 = 8 s), use a 30 000 ms TTL with renewal at 15 000 ms.

Never set TTL below 500 ms for any production workload — GC pauses, slow DNS resolution, and kernel scheduler jitter all introduce sub-500 ms unpredictability even on well-tuned hosts.

Memory Budget

Each Redis lock key consumes approximately 200–400 bytes (key string + value token + metadata). At 10k concurrent locks, this is ~4 MB — negligible. At 1M concurrent locks (e.g. a payment platform with millions of in-flight idempotency windows), budget 400 MB of Redis RAM for lock keys alone, plus idempotency key storage TTL management overhead.

Set a dedicated Redis instance (or database index) for lock keys, separate from application cache and session storage. This prevents cache eviction policies (allkeys-lru) from silently evicting active locks under memory pressure.

Index and Key Scan Strategy

Never use KEYS lock:* in production — it blocks the Redis event loop. For auditing active locks, maintain a Redis Set active_locks:{service} that is updated atomically alongside each SET NX and DEL operation. Use SSCAN to iterate members without blocking.

For PostgreSQL advisory locks, query pg_locks with locktype = 'advisory' to enumerate holders. Join with pg_stat_activity to correlate lock holders with active connections.

SRE Alert Thresholds

Define these metrics and alert conditions for any deployment:

lock.acquisition.p99_ms > 10 ms — coordination layer under load; investigate etcd leader or Redis slow log.
lock.acquisition.failure_rate > 1 % — contention or coordination unavailability; check circuit-breaker state.
lock.renewal.failure_rate > 0.1 % — GC pause, network jitter, or TTL too short; check JVM GC metrics.
lock.fencing_token_rejected > 0 — stale holder active; confirm GC pause durations and clock offset.
lock.orphan.ttl_expired > 5/min — processes crashing or network partitions; check error logs for context deadline exceeded.

Distributed Coordination & Locking Strategies — parent reference covering the full coordination-layer deduplication approach.
Lock Timeout & Lease Management — TTL alignment, heartbeat renewal patterns, and safe release under crash recovery.
Preventing Race Conditions in Microservices — inter-process coordination, outbox pattern integration, and idempotency token validation workflows.
Consensus Algorithms for Deduplication — Raft and ZAB-backed coordination for financial-grade linearizability requirements.
Implementing Redlock for High-Availability Deduplication — step-by-step Redlock deployment with clock synchronization requirements and validation test suite.