Concurrent Writes

We have previosuly learnt how strict quorum reads and writes can be employed in leadless replication and allow us to retrieve latest data values for the most part. However, using strict quorums doesn’t prevent concurrent writes for the same data value resulting in conflicts, which if not resolved can result in data loss.

Before we embark on learning how conflicts are resolved or avoided with quorum writes, we shall explain what is meant by a concurrent write. Generally, concurrent writes are understood to mean writes that happen at exactly the same time. In distributed systems, clocks can’t be perfectly synchronized on all the nodes and it becomes difficult to tell if two writes indeed occurred simultaneously.

Example

Consider a system that stores key/value pairs. Node A writes a pair (X, 7) to the system. Node B retrieves Node A’s write, increments the key’s value and writes back (X, 8). In this example there are two write events, one by Node A and one by Node B. We can say that Node A’s write happens before Node B’s write since Node B builds upon or depends upon Node A’s write. Since Node A’s write happens before Node B’s write the two events aren’t concurrent and Node B’s write is said to be causally dependent on Node A’s write.

In contrast, consider the situation where the two nodes attempt to update the same key after a few hours without knowing that the other node also intends to update the same key. In this case, the writes are concurrent because neither is aware of the other’s occurrence. Note, it isn’t necessary that the two writes overlap in time. Even if they do, it isn’t guaranteed that the writes reach the destined replicas at the same time. Write requests can get delayed because of transient network or node failures. If several clients update the same key, the system can end up with inconsistent data if sane conflict resolution isn’t applied. Consider the following sequence of events when Node A attempts to update key X with a value 17 and Node B attempts to update the key X with a value 39.

Node A’s write reaches replica#1 and Node B’s write never makes it to replica#1 because of a network outage.
Replica#2 receives Node A’s write first and then Node B’s write second.
Replica#3 receives Node B’s write first and then Node A’s write second.

It isn’t clear what should be the final value for the key that was updated by both nodes A and B. If each node kept the latest value it received for key X, then replica#1 will store (X,17), replica#2 will store (X, 39) and finally replica#3 will store (X, 17). Thus the system doesn’t have a consistent value across the replicas for the key X.

Need for convergence

For replication systems to work correctly, we want all the nodes in the system to eventually converge on the same value. As discussed earlier, two concurrent writes don’t define an order between themselves, The order between them is undefined i.e. neither happens before the other. Concurrent events are unrelated to each other, are unaware of each other, and lack a natural ordering.

When an event happens before a second event, it makes sense for the second event to overwrite the effects of the first event. However, in case of concurrent events, a conflict occurs that must be resolved.

Forcing an order

If concurrent writes come without any order, we can attempt to force an order on them. We can either attach a timestamp with every write or a strictly increasing identifier that allows us to unambiguously compare two events and declare one to occur before the other. The strategy to use timestamps to order concurrent writes and pick the latest write by timestamp as the final value while discarding other concurrent writes is known as Last Write Wins (LWW). Cassandra uses LWW for conflict resolution and is optionally available in Riak.

Issues

LWW resolves conflicts at the cost of durability. Concurrent writes, reported as successful to clients because they were written on W nodes, can still be lost when using LWW and may be a bad choice if data loss is unacceptable. One reasonable use of LWW is in the design of caches, where data loss can be tolerated. Cassandra recommends using immutable keys (using UUID) that once written don’t change to avoid data loss.