A field guide

Distributed systems are different.

These ten primitives are the vocabulary of distributed work. The intuition that works for a single machine breaks the moment more than one is involved — and the gap is wider than it looks.

10 primitives · ~12 min read · Updated 2026

How to read this guide

Each primitive follows the same shape: the problem that arises in a distributed setting, the mechanism that addresses it, and the failure modes that make it harder than it looks. The "When it fits" line at the end is the scope where you'll actually need it.

Lamport: A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable. The primitives below are the tools that make this survivable.

Consensus

Raft, Paxos, leader election — agreement under failure.

Problem

A distributed system needs to agree on shared state — who's leader, what the next log entry is — even when nodes fail or messages are dropped.

Shape

Consensus protocols (Paxos, Raft, ZAB) provide a way for a majority of nodes to agree on a single value despite failures. Raft is the most implementable; understand it before reaching for alternatives.

Watch for

Consensus is expensive — every decision requires a quorum round-trip. The FLP impossibility result also says it's unsolvable in a fully asynchronous system with even one faulty process, so every real protocol relies on timing assumptions or randomization. Use consensus sparingly, for the values that genuinely need it (leader, config, metadata), not for every write.

When it fits: Distributed databases, configuration stores, leader election, distributed locks. Anything where multiple nodes need a single source of truth.

Message Delivery Guarantees

At-least-once, at-most-once, exactly-once — and what each really means.

Problem

Networks lose messages. Retries cause duplicates. "Exactly-once" sounds right but is much harder than it looks.

Shape

At-most-once: don't retry on uncertain outcomes; messages can be lost, never duplicated. At-least-once: retry until acked; duplicates possible. Effectively once: at-least-once plus idempotent processing or a dedup key — what most "exactly-once" libraries actually deliver. True exactly-once requires the producer, broker, and consumer side-effects to participate in the same atomic commit (Kafka transactions, transactional outbox); without that, you have idempotent retry, which is a fine answer but a different one. Pick semantics based on what the consumer can tolerate.

Watch for

"Exactly-once" advertised by brokers usually means "at-least-once with idempotent handlers." The consumer still has to deduplicate. There is no magic.

When it fits: Any time you cross a network with state changes. The choice between guarantees is about the cost of duplicates vs the cost of loss.

Distributed Locks

When you need them, how to do them safely, and the failure modes.

Problem

Multiple nodes need exclusive access to a shared resource. Local locks don't cross processes; database row locks block other work.

Shape

Use a coordination service with TTLs and fencing tokens. Take the lock, do the work, release. If you crash, the TTL expires and someone else takes over. etcd and ZooKeeper provide linearizable consensus and are the safe defaults; Redis Redlock is a contested algorithm (Kleppmann's critique) and should not be treated as equivalent — use it only when its actual guarantees match your needs.

Watch for

Locks without fencing tokens are unsafe — a slow process can hold a lock past its TTL, then write stale data to a resource someone else now owns. Always check the fencing token at the resource.

When it fits: Single-leader workflows (cron jobs, leader-only operations), exclusive resource ownership. When in doubt, prefer designs that don't need locks (idempotency, partitioning).

Service Discovery

Finding peers in a system where instances come and go.

Problem

Hard-coded addresses break the moment instances scale up, get replaced, or move regions. The DNS approach has TTLs; the static config approach is brittle.

Shape

Services register themselves with a discovery mechanism (Consul, etcd, Kubernetes service registry). Clients query the registry (directly or via DNS) for current healthy instances. Health checks remove dead instances.

Watch for

DNS caching can keep clients pointing at dead instances after they're gone. Registry as a single point of failure — its availability becomes the system's availability ceiling.

When it fits: Microservices, dynamic environments (Kubernetes, autoscaling), any system where service instances aren't static.

Clock Synchronization

Lamport clocks, vector clocks, hybrid logical clocks, causal ordering.

Problem

Wall clocks drift across nodes. NTP synchronization isn't tight enough for ordering events. "Last write wins" with skewed clocks loses correct writes.

Shape

Lamport clocks are scalar counters that preserve happens-before but can't tell concurrency from causality. Vector clocks add a per-node counter so you can detect concurrent events. Hybrid logical clocks (HLC) combine wall time with a logical component to keep timestamps monotonic and roughly aligned with real time, but like Lamport they're scalar — they preserve causal order, they don't expose concurrency. Use Spanner-style TrueTime where you have it.

Watch for

Causality is not the same as wall-clock time. Two events can be concurrent (no causal relationship) yet have very different timestamps. Building UIs that imply ordering requires care.

When it fits: Distributed databases, conflict-free data types, anywhere you need to reason about "what happened before what" across machines.

Sharding & Consistent Hashing

Distributing data across nodes; surviving node changes without massive remapping.

Problem

A single database eventually hits ceilings (CPU, RAM, disk). Naive sharding (modulo node count) requires rehashing every key when nodes are added or removed.

Shape

Consistent hashing maps keys and nodes to a ring. Each key goes to the next node clockwise. Adding or removing a node only remaps about K/N keys (one node's worth), not the whole space. With virtual nodes, those keys are pulled evenly from the ring's existing neighbors, smoothing both load and rebalancing.

Watch for

Hot keys (one shard gets disproportionate traffic) defeat the scheme. Skewed data distribution requires resharding strategies — not always automatic.

When it fits: Any system that horizontally partitions data (databases, caches, queue partitions). The default for systems built to scale beyond a single node.

Quorums & Replication

Majority decisions, read/write quorums (R + W > N), and replication topologies.

Problem

Replication trades off latency, consistency, and availability. Without rules, replicas drift; with strict rules, every write blocks on the slowest node.

Shape

For N replicas, configure read quorum R and write quorum W. R + W > N guarantees that every read overlaps with the latest write quorum — necessary for strong reads, but not sufficient for linearizability under sloppy quorums, concurrent writes, or read repair races. R = 1, W = N gives fast reads but writes block on the slowest replica and have zero availability if any replica is down. R = W = ⌈(N+1)/2⌉ balances both.

Watch for

Quorum doesn't guarantee freshness if reads bypass it. Asynchronous replication is fast but lossy on failover. Synchronous replication is durable but increases latency.

When it fits: Any replicated datastore. The R/W choice is not a one-time decision — different operations may use different quorums.

Failure Detection

Heartbeats, gossip, phi-accrual — knowing when a node is gone.

Problem

A network partition looks identical to a node failure from the outside. Acting on the wrong assumption (split-brain, false failover) makes things worse.

Shape

Heartbeats: nodes send liveness pings on intervals. Gossip: nodes exchange information about their view of other nodes. Phi-accrual: probabilistic detection that adapts to network conditions instead of a fixed timeout.

Watch for

Tight failure detection causes flapping under transient network issues. Loose detection means slow failover. The right setting depends on your network characteristics, not a default.

When it fits: Cluster management, leader election, replication topology decisions, any system where node membership changes need to be detected reliably.

CRDTs / Conflict Resolution

Convergent merging — concurrent updates that always reach the same result.

Problem

Multi-master replication produces concurrent updates that conflict. Last-write-wins drops data. Application-level merge logic is complex and error-prone.

Shape

CRDTs (Conflict-free Replicated Data Types) are data structures with mathematically guaranteed merge functions — counters, sets, registers, sequences. Concurrent updates from any number of replicas converge to the same state.

Watch for

CRDT structures grow with history (some can be GC'd, others can't). Not every domain fits a CRDT — a counter is easy; a relational schema is not.

When it fits: Collaborative editing (Google Docs, Figma), offline-first apps, multi-region active-active systems where availability beats strict consistency.

Eventual Consistency

Convergence semantics — what "eventually" really promises.

Problem

Strong consistency limits availability and latency. But "eventually consistent" is meaningless without specifying when, in what order, and from whose viewpoint.

Shape

Define the consistency model. Strongest first: linearizability (single global order, real-time respected), sequential consistency (single order, real-time not required), causal consistency (causally-related operations agree everywhere), read-your-writes / monotonic reads (per-client guarantees), eventual (replicas converge if writes stop). Use anti-entropy mechanisms (Merkle trees, read repair) to converge replicas. Design clients to handle stale reads gracefully.

Watch for

"Eventually" can be seconds, minutes, or hours under partition. Application code that assumes immediate consistency works in tests and breaks in production. Test against the actual consistency model.

When it fits: Geo-distributed systems, AP-side CAP trade-offs, anywhere availability matters more than latest-write visibility. Default for high-scale globally-distributed services. Remember CAP only governs behavior during a network partition; PACELC extends it by adding the latency-vs-consistency choice when there is no partition, which is where most production trade-offs actually live.

Respect partial failure as the default

Distributed systems aren't single-machine systems with extra steps. The mental model that works locally — atomic operations, shared memory, synchronous order — silently fails the moment two machines are involved.

These ten primitives are how you build systems that survive the physics. Skip them and you reinvent them, badly, under outage pressure.