Time Is an Illusion: Logical Clocks in Distributed Systems
In a distributed system, there is no global clock. This single fact has more consequences than most engineers appreciate.
The Problem
Two servers, A and B. Server A writes x = 1 at time . Server B writes x = 2 at time . Which write came first?
If , A was first — right? Wrong. Wall clocks drift. NTP corrections can make time go backwards. You cannot use physical time to establish causality.
Google's TrueTime API (used in Spanner) solves this with GPS and atomic clocks, providing bounded clock uncertainty. But most systems don't have access to atomic clocks.
Lamport Timestamps
Leslie Lamport's 1978 insight: we don't need physical time. We need a logical clock that respects causality.
Rules:
- Before each event, increment the counter:
- When sending a message, include the timestamp
- On receiving a message with timestamp :
This gives us: if (a causally precedes b), then .
But not the converse: does not imply . Events may be concurrent.
Vector Clocks
To detect concurrency, we need vector clocks. Each process maintains a vector :
Now we can determine:
- iff (componentwise)
- (concurrent) iff neither nor
class VectorClock:
def __init__(self, n: int, pid: int):
self.clock = [0] * n
self.pid = pid
def tick(self):
self.clock[self.pid] += 1
def send(self) -> list[int]:
self.tick()
return self.clock.copy()
def receive(self, other: list[int]):
for i in range(len(self.clock)):
self.clock[i] = max(self.clock[i], other[i])
self.tick()
The tradeoff: vector clocks have space per event, where is the number of processes. For large systems, this is prohibitive.
Hybrid Logical Clocks
HLC combines physical and logical time, giving us the best of both worlds:
The physical component captures "roughly when" something happened. The logical component breaks ties. This is what CockroachDB uses for MVCC timestamps.
The beauty of HLC: the timestamp is always within a bounded offset of the true physical time, while still providing the causal ordering guarantees of logical clocks.