System Design Notes
Don’t forget to get your copy of Designing Data Intensive Applications the single most important book to read for system design interview prep!

R + W > N

Say we always read from r nodes and write to w nodes in a n node system then as long as the following inequality holds the writes to the system will be durable and the read from the system will return the latest value:

R + W > N

For our three node system, where N=3 we can choose R=1 and W=3. With these values we'll write to all the nodes in the system and if any of the nodes is down then the write request will fail. Reading from any of the nodes in the system will naturally return the latest value since the write was recorded at all the nodes. As a corollary of the above inequality we can further say:

  1. If R < N then we can still process reads if a single node is down. We can tolerate a higher number of nodes being offline for higher values of N.
  2. If W < N then we can still process writes if a single node is down. We can tolerate a higher number of nodes being offline for higher values of N.
  3. A system with N nodes can tolerate N/2 rounded down node failures.

Generally, N is set to an odd number but N can also be even. For instance, if N=4, then one configuration that can work is R=3 and W=2 to make R+W (3+2=5) greater than N=4. We can reason that if a write is recorded to two nodes and both of them go offline then any read request will fail since the read request must be responded to by at least three nodes. Reading and writing in this manner is called as quorum reads and quorum writes respectively. The reads and writes are generally sent to all the nodes but the R and W values determine how many nodes do we wait to receive response from before considering the write or the read as successful. Reads will fail when fewer than R nodes are available and writes will fail when fewer than W nodes are available.

It is interesting to consider what happens if we flip the inequality as follows:

R + W <= N

In the above scenario, the system’s availability improves since now the reads and writes can go through even if N/2 (rounded down) nodes are offline. However, consequently, the probability of stale reads also goes up.

Caveats with quorum reads and writes

It might seem that using quorum reads and writes guarantees the latest data from a system, however, that may not hold true in a few edge cases depending on the implementation details of the system. Some of these cases are discussed below:

  • When two writes are received at the same time, i.e. concurrently, it can’t be said which write occurs first and a conflict occurs. In this situation the implementation may decide to merge the two writes or use some other conflict resolution algorithm.
  • Consider the three node system we discussed earlier. Suppose a write fails for two nodes but is successful for the third one. The node that successfully recorded the write can roll-back the update or return the updated value depending on the implementation, even though the quorum write has failed in this case.
  • When a write and a read happen at the same time, i.e. concurrently, the write may not get a chance to be replicated on other nodes in the system. This may cause the read to return either the old or the new value depending on which replicas respond to the read request.
  • The quorum write condition can be violated when a previously failed replica with latest data comes back up oline and restores its state from a replica with stale data.

In practice, systems using leaderless replication such as Dynamo, advertise eventual consistency and don’t guarantee that reads return the latest values. Generally, with quorum reads and writes the read guarantees discussed previously such as reading one’s writes, monotonic reads and consistent prefix reads aren’t promised. Stronger guarantees require transactions or consensus, as we shall learn later.