What is a stale lock in a distributed system?

A stale lock occurs when a process believes it still holds a distributed lock after the underlying lease has expired — due to GC pauses, network partitions, or NTP drift — allowing a second holder to acquire the lock and creating concurrent access to a protected resource.

How do fencing tokens prevent stale lock damage?

Each lock acquisition returns a monotonically increasing generation counter (fencing token). The protected resource rejects any write whose token is lower than the highest token it has already accepted, making stale operations harmless.

What TTL should I use for distributed lock leases?

Set lease TTL to at least 3 × the heartbeat interval, then add a GC-pause budget of 500 ms–2 s depending on your runtime. A common starting point for JVM services is TTL = max(3 × heartbeat, p99 GC pause × 2).

Handling Stale Locks in Distributed Systems

Part of: Lock Timeout & Lease Management

A stale lock is a lock whose holder believes it still owns exclusive access after the coordination layer has already expired the lease and potentially granted it to a new owner. The original holder resumes work under false ownership, producing concurrent mutations against a resource that was meant to be mutually exclusive. In payment pipelines this manifests as double-charges and ledger divergence; in microservice orchestration it triggers duplicate saga steps and cascading retry storms.

This page is a focused runbook. You need to understand lease-based locking from Lock Timeout & Lease Management and the broader guarantees provided by the Distributed Coordination & Locking Strategies patterns before implementing the steps below. All code is copy-pasteable and independently verifiable.

The core mechanism: fencing tokens

The canonical defence against stale lock damage is the fencing token — a monotonically increasing integer returned with every lock acquisition. The protected resource stores the highest token it has ever seen and rejects any operation carrying a lower one.

Lock acquisition #1 → token = 42   (node A acquires)
Node A pauses (GC, network)         (lease expires)
Lock acquisition #2 → token = 43   (node B acquires, processes, releases)
Node A resumes with token = 42      (resource rejects — 42 < 43)

The diagram below shows this lifecycle across three actors: the lock service, the protected resource, and two competing nodes.

Problem statement and prerequisites

What you are implementing: detection of expired lock ownership at the point of resource mutation, plus safe cleanup of orphaned lock records across three coordination backends.

Prerequisites:

You understand lock timeout and lease mechanics — specifically TTL alignment and heartbeat renewal.
You are familiar with distributed lock acquisition patterns and know why a simple DEL is unsafe.
Your protected resource (database row, queue consumer, external API call) can be modified to check a fencing token before accepting a write. Without resource-side enforcement, fencing tokens do not help.

Step-by-step implementation

Step 1 — Return a fencing token on every lock acquisition

The lock service must increment and persist a generation counter atomically with the lock grant. Callers must store the returned token and attach it to every downstream operation.

Redis (Python)

import redis
import uuid
import time

r = redis.Redis(host="localhost", port=6379, decode_responses=True)

def acquire_lock(resource: str, ttl_ms: int = 10_000) -> tuple[str, int] | None:
    """
    Returns (lock_value, fencing_token) or None if acquisition failed.
    fencing_token is a monotonically increasing integer from a Redis INCR counter.
    """
    lock_value = str(uuid.uuid4())
    token_key = f"fence:{resource}"
    lock_key = f"lock:{resource}"

    # Increment the fence counter regardless of who holds the lock
    # so the sequence is always strictly increasing.
    token = r.incr(token_key)

    acquired = r.set(lock_key, lock_value, px=ttl_ms, nx=True)
    if acquired:
        return lock_value, token
    # Decrement is NOT safe here — just return None and let the caller retry.
    return None

Redis (Go)

package lock

import (
    "context"
    "errors"
    "github.com/google/uuid"
    "github.com/redis/go-redis/v9"
    "time"
)

type Acquired struct {
    Value        string
    FencingToken int64
}

var ErrNotAcquired = errors.New("lock not acquired")

func Acquire(ctx context.Context, rdb *redis.Client, resource string, ttl time.Duration) (Acquired, error) {
    tokenKey := "fence:" + resource
    lockKey  := "lock:" + resource
    value    := uuid.NewString()

    token, err := rdb.Incr(ctx, tokenKey).Result()
    if err != nil {
        return Acquired{}, err
    }

    ok, err := rdb.SetNX(ctx, lockKey, value, ttl).Result()
    if err != nil {
        return Acquired{}, err
    }
    if !ok {
        return Acquired{}, ErrNotAcquired
    }
    return Acquired{Value: value, FencingToken: token}, nil
}

Step 2 — Validate ownership atomically before release

Never use a bare DEL. Use a Lua script so the check-then-delete is atomic. If the key has already expired and been re-acquired by a new owner, the script returns 0 and the caller knows the lock was lost before it could be released cleanly.

Lua script (all Redis clients)

-- KEYS[1] = lock key, ARGV[1] = lock value held by caller
if redis.call("get", KEYS[1]) == ARGV[1] then
    return redis.call("del", KEYS[1])
else
    return 0
end

Node.js wrapper

const releaseLock = async (redisClient, resource, lockValue) => {
  const luaScript = `
    if redis.call("get", KEYS[1]) == ARGV[1] then
      return redis.call("del", KEYS[1])
    else
      return 0
    end
  `;
  const result = await redisClient.eval(luaScript, {
    keys: [`lock:${resource}`],
    arguments: [lockValue],
  });
  if (result === 0) {
    // Lock was already expired or taken by another node — log and handle
    console.warn({ resource, lockValue }, "stale lock detected on release");
  }
  return result === 1;
};

Step 3 — Enforce the fencing token at the protected resource

This is the step most implementations skip, which is why stale locks cause real damage. The resource — a Postgres row, a DynamoDB item, a Kafka consumer group offset — must reject writes whose token is lower than the last committed token.

PostgreSQL (with a generation column)

-- Add a generation column to the protected table
ALTER TABLE payments ADD COLUMN lock_generation BIGINT NOT NULL DEFAULT 0;

-- Only update if the incoming generation is higher than the stored one
UPDATE payments
SET    status = 'processing',
       lock_generation = $1       -- fencing token from lock acquisition
WHERE  payment_id = $2
  AND  lock_generation < $1;      -- reject stale writers

-- Check rows_affected in application code; 0 means the write was rejected

Java (Spring JDBC)

@Transactional
public boolean applyWithFencingToken(String paymentId, long fencingToken) {
    int rows = jdbcTemplate.update(
        "UPDATE payments SET status = 'processing', lock_generation = ? " +
        "WHERE payment_id = ? AND lock_generation < ?",
        fencingToken, paymentId, fencingToken
    );
    if (rows == 0) {
        log.warn("Stale fencing token {} rejected for payment {}", fencingToken, paymentId);
    }
    return rows > 0;
}

Step 4 — Size the lease TTL to account for GC pauses and clock drift

A TTL that is too short causes legitimate holders to expire under GC or network hiccups; a TTL that is too long leaves orphaned locks alive for minutes. Use this formula as a starting point:

TTL = max(3 × heartbeat_interval, p99_GC_pause_ms × 2) + 500ms_network_budget

For a JVM service with a 1 s heartbeat and a measured p99 GC pause of 800 ms:

TTL = max(3000ms, 1600ms) + 500ms = 3500ms

Configure the idempotency key TTL on your deduplication store to match or slightly exceed this value so that an expired lock does not leave a dangling PROCESSING record. The consensus-based deduplication layer must reconcile any record still in PROCESSING state after TTL + 500ms.

Step 5 — Implement etcd and DynamoDB variants

etcd (bash inspection + Go renewal)

# Inspect a specific lease and its attached keys
etcdctl lease timetolive <LEASE_ID> --keys

# List all active leases to find orphans during an incident
etcdctl lease list

// Renew a lease; if the grant has already expired, re-acquire
resp, err := cli.KeepAliveOnce(ctx, leaseID)
if errors.Is(err, rpctypes.ErrLeaseNotFound) {
    // Lease expired — must re-acquire and replay the critical section
    return ErrLeaseExpired
}

DynamoDB (conditional write with TTL)

import boto3
from boto3.dynamodb.conditions import Attr
from botocore.exceptions import ClientError
import time

dynamodb = boto3.resource("dynamodb")
table = dynamodb.Table("distributed_locks")

def acquire_dynamo_lock(resource: str, owner_id: str, ttl_seconds: int = 10) -> bool:
    expiry = int(time.time()) + ttl_seconds
    try:
        table.put_item(
            Item={
                "lock_id":   resource,
                "owner_id":  owner_id,
                "expires_at": expiry,
            },
            ConditionExpression=(
                Attr("lock_id").not_exists() |
                Attr("expires_at").lt(int(time.time()))
            ),
        )
        return True
    except ClientError as e:
        if e.response["Error"]["Code"] == "ConditionalCheckFailedException":
            return False
        raise

Verification and testing

Simulate GC-induced lease expiry in Redis

# Hold a lock in one terminal; from another, force-expire it mid-critical section
redis-cli SET lock:payment-123 "node-a-uuid" PX 3000 NX
redis-cli DEBUG SLEEP 5   # pause Redis for 5 s, longer than the TTL
# The key will have expired; verify:
redis-cli EXISTS lock:payment-123   # should return 0

Verify fencing token rejection

# Manually set a high-water mark of 50 in the generation column
psql -c "UPDATE payments SET lock_generation = 50 WHERE payment_id = 'pay_001';"

# Attempt a write with a stale token of 42
psql -c "UPDATE payments SET status='processing', lock_generation=42 \
         WHERE payment_id='pay_001' AND lock_generation < 42;"
# Expected: UPDATE 0 (zero rows affected — write correctly rejected)

Check orphaned etcd leases

etcdctl lease list --keys
# Any lease with 0 keys attached and a non-zero TTL is an orphan
# Revoke manually during incident response:
etcdctl lease revoke <LEASE_ID>

Inspect Redis lock state

redis-cli GET lock:payment-123          # should be empty if released
redis-cli GET fence:payment-123         # generation counter — should be monotonically increasing
redis-cli TTL lock:payment-123          # remaining TTL in seconds; -2 = expired/deleted

Failure scenarios and debugging

Failure Scenario	Remediation Steps	Observability Hooks
GC pause exceeds TTL; second owner acquires lock; original resumes with stale write	Fencing token check at the resource rejects the stale write. Increase TTL headroom: `TTL = p99_GC × 2 + 500ms`. Enable G1GC region size tuning to reduce pause variance.	`stale_fencing_token_rejections_total` (Counter); OTel span attribute `lock.fencing_token`; JVM GC pause histogram `jvm_gc_pause_seconds`
Network partition causes split-brain; two nodes each believe they hold the lock	Fencing token monotonicity makes only the higher-token holder’s writes succeed. Implement Redlock across 5 independent Redis nodes to require a quorum of 3 for acquisition.	`lock_quorum_failures_total` (Counter); `lock_split_brain_detected` (Gauge, alert if > 0); distributed trace baggage field `lock.generation`
NTP clock skew causes renewal to arrive after coordinator considers lease expired	Add 200ms clock-skew budget to every TTL calculation. Use a coordinator-relative timestamp (`etcd`’s lease revision) instead of wall-clock comparisons. Set up `chronyd` or `timedatectl` monitoring for skew > 50ms.	`lock_renewal_rejection_total` (Counter); `ntp_offset_ms` (Gauge, alert if > 50); etcd `lease_ttl_remaining_ms` metric
DynamoDB background TTL deletion races with active renewal	Maintain a separate `lease_status` attribute (`ACTIVE`/`EXPIRED`) updated in the same conditional write as the TTL. Do not rely solely on `TimeToLive` for liveness decisions. Apply exponential backoff capped at `TTL / 2` on `ConditionalCheckFailedException`.	`dynamo_lock_renewal_failures_total` (Counter); DynamoDB CloudWatch `ConditionalCheckFailedRequests` metric; structured log field `lease_status`
Idempotency record stuck in `PROCESSING` after lock expiry	Run a reconciliation job every `TTL + 500ms` to scan records in `PROCESSING` state older than `TTL`. Transition them to `FAILED` or retry via the transactional outbox pattern.	`processing_stuck_records_total` (Gauge, alert if > 0 for > 30s); structured log field `idempotency_key_state`; trace span `dedup.reconcile`

SRE / observability checklist

Emit or verify these signals before shipping stale-lock handling to production:

stale_fencing_token_rejections_total — Prometheus Counter incremented every time the resource-side check rejects a write with token < hwm. Alert on rate > 1/min over a 3-minute window.
lock_acquisition_latency_ms — Histogram with p50/p99/p999. A rising p99 (> 80ms for Redis, > 200ms for etcd) indicates coordinator degradation before leases start expiring unexpectedly.
stale_lock_renewal_failures_total — Counter tracking KeepAliveOnce errors. Alert when > 5 in any 2-minute window.
OTel baggage propagation — Attach lock.id, lock.generation, and lock.acquired_at_ns to every outgoing span so downstream services can correlate the fencing token with the protected operation.
Structured log fields on every lease event — node_id, resource, token, ttl_ms, event (acquired | renewed | released | expired | rejected). Index on token and resource for incident triage.
processing_stuck_records_total — Gauge counting idempotency records stuck in PROCESSING beyond TTL + 500ms. A non-zero reading means the reconciliation job is not running or the fencing token check is not wired up correctly.

Lock Timeout & Lease Management — parent page covering TTL sizing, heartbeat renewal intervals, and lease-expiry guarantees.
Implementing Redlock for High-Availability Deduplication — how to acquire a quorum-based lock across five independent Redis nodes, reducing split-brain risk.
Mitigating Thundering Herd During Retry Storms — when expired locks trigger mass retries, these patterns prevent amplification.
Consensus Algorithms for Deduplication — Raft-backed append-only logs that provide exactly-once semantics across the retry boundary after a lock expires.