Leader-based replication is an ideal choice for read-scaling scenarios where the read requests processed by a distributed system are far more than the number of write requests. This is often true of internet applications. The number of followers can be increased as the read load on the system increases.
However, leader-based replication has its limitations.
Consider the scenario of a Twitter user making a new tweet. In a leader-based replication scenario, the tweet goes to the leader to be committed to the underlying data store. If the user refreshes his timeline, it is possible that her read request may go to a follower that is behind the leader and may not return the latest tweet of the user. The user may erroneously believe that his tweet has been lost. For such scenarios, we want read-after-write consistency or read your own writes. This is a guarantee that a user is always able to immediately read the writes she has submitted or in case of our Twitter user, she sees her tweet on her timeline immediately after submitting it and forcing a browser refresh. Note that the read-after-write consistency doesn’t imply that a different user is able to see the writes made by the first user, since the reads for the other user may be routed to a read replica which is behind the leader. However, eventually all users see the writes made by each other because of eventual consistency.
We can tweak the leader-based replication in some ways to achieve read-after-write consistency. These are:
You can tweet using your desktop and the refresh your Twitter timeline on your phone. A correctly designed system should show you your tweet on your mobile device even though it was sent from your desktop. Read-after-write consistency across devices adds complexity as now the schemes to remember the timestamp of last updates may not work because the mobile device can’t possibly when a write was made from the desktop browser. This information now has to be stored on the server side so that all devices in use by a user can know when the user last made an update.
Modern architectures involve replicating data across geographies to move data closer to users and to also mitigate the loss of data from continental-wide calamities. If the replicas live in geographically distributed datacenters and data needs to be read from the leader then it must be ensured that read requests from all of the user’s devices are able to connect to the datacenter that hosts the leader.
Suppose you follow the president of the United States (POTUS) on Twitter. POTUS posts two tweets in quick succession, the first one at timestamp T1 and the second one at T2. The writes go through the leader and then are in the process of getting replicated on followers F1 and F2. Luckily F1 almost immediately catches up to the leader and records both the tweets. However, F1 experiences a network partition for the time being and only replicates the first tweet.
Now, if you log into your Twitter account and navigate to POTUS Twitter timeline, your request may be routed to follower F1 which is in-sync with the leader and returns both tweets from POTUS. Say, now you refresh the browser and this time your request is routed to follower F2, which still has to copy the second POTUS tweet. Suddenly you see the POTUS Twitter timeline missing data that you had already seen the first time you navigated to that location.
In other words, you are reading older data after you already read newer data in the first query. Note that it would have served us just fine if the first query didn’t return any results as then the second query wouldn’t appear going backwards in time. The monotonic reads guarantee that one never observes stale data after already having observed the latest data. Without monotonic reads, one can observe the state/snapshot of the data at an earlier point in time after seeing state/snapshot of the data at a later time.
One possible solution is to direct all queries from a particular user to a given replica. In our example if all requests from our user account were directed to F2, we’d never see the Twitter timeline go backwards in time. Yes, we might not get all or the latest data but because of eventual consistency at some point in time we should receive all the updates. Monotonic reads is a weaker guarantee than strong consistency but stronger than eventual consistency.
Imagine a scenario where the North Korean dictator tweets “Successful missile tests!” and in response POTUS tweets “Sanctions imposed on North Korea”. The timeline of tweets will reveal that sanctions were imposed by the US after North Korea test-fired missiles, but if we flip the order of the tweets, users can comprehend that North Korea test-fired missiles after the US imposed sanctions.
Taking this example a step further, we can see there’s a casual relationship between the two tweets which translate into writes on the backend database. It could potentially happen that the tweet from the North Korean dictator is replicated on a set of servers in Asia while the one from POTUS is replicated on a set of servers in North America. Eventually, the tweets get copied over to all the servers but for a short period of time the regional servers may not have cross-continental tweets. In such a scenario, a user retrieving tweets in Brazil may connect to the North American servers and see the tweet from POTUS first and then see the tweet from North Korea, thinking that North Korea test-fired missiles in retaliation to US sanctions. Because of the lag in replication tweets globally the order of the writes is altered.
The consistent prefix reads guarantee says that if a sequence of writes happen in a certain order then the writes are served to the reader in the same order. This is especially important for distributed databases that have different partitions and there’s no global ordering of writes. One mitigation is to direct all the related writes to the same partition. Dependency tracking software is used in situations where writing to the same partition isn’t possible.
Going back to our tweet example, we can have a tweet and its responses, all written to the same partition so that the reader always sees the related tweets in order.