System Design Notes
Don’t forget to get your copy of Designing Data Intensive Applications the single most important book to read for system design interview prep!

Video Streaming Queue: A Concurrent Writes Example

Consider two users A and B, using the same credentials to log-in to a video streaming service such as YouTube. Let’s assume these two users sit in different geographical locations unaware of each other’s actions. We’ll also assume the service has a single replica to make the example simple. Both users try to add movies for various actors in the queue and their interactions with the service are captured below. Instead of a two user scenario, we could also play the sequence of events for a single user who interacts with the service using two different devices.

  1. User A starts first and adds a movie for Tom Cruise in the queue. The browser sends this update to the backend server which notes the state of the queue [Tom] as version number#1. We’ll use first names of actors in the queue to keep the example simple. Furthermore, the server responds to the client request with the current state of the queue as well as its version. Since the queue was empty to begin with, the state returned by the server is [Tom] and version#1.
  2. Next user B adds a Brad Pitt movie to the queue. Note, that from user B’s perspective the queue is empty and she doesn’t know what movies user A has already put into the queue. When the server receives the entry from user B as [Brad], the server must merge the state of the queue with that received from user A, since the writes from the two users are concurrent, i.e. neither knows about the other. The server thus stores the queue state as [Tom, Brad] and increments the queue state to version#2 and the same is sent back to user B.
  3. Next, user A wants to add a George Cloony movie in the queue. User A received [Tom] and version#1 from the server in response to the first write request. User A sends the previously received version and the new queue state [Tom, George] to the server. When the server receives user A’s second write request, it can determine from the version number sent by user A that the user wants to overwrite the version#1 state of the queue with the new state [Tom, George]. The server, however, already received the write request from user B to include a Brad Pitt movie in the queue. You can now appreciate the shortcomings of a LWW-like conflict resolution mechanism, which would simply override the queue state with [Tom, George] and drop user B’s write request for Brad. We can assume that the server simply merges the write requests as a conflict resolution mechanism. Thus the server will override the queue state versioned#1 and merge the outcome with any concurrent writes received from B. The resulting queue state will be [Tom, Brad, George] and the version will be bumped to 3. Note, the server responds to the write request of user A with the updated queue state as well as the version.
  4. Say now user B wants to add a Johnny Depp movie to the queue. This new write request builds upon the response of the first write request by user B, which was [Tom, Brad]. The new write request is therefore causally dependent on the response of the first write request. The second write request from user B should overwrite the version#2 of the queue state. Thus the user B sends to the server [Tom, Brad, Johnny] and version#2. The server on receiving the write request overwrites version#2 of the queue state and also merges the concurrent writes from user A that occurred after version#2 of the queue state. The queue state is thus changed to [Tom, Brad, George, Johnny] and the server sends it back to user B along with the version bumped-up to 4.
  5. The pattern should be apparent now. User A wants to add a Angelina movie to the queue and sends the state [Tom, Brad, George, Angelina] along with version#3 to the server. The server overwrites the queue state at version#3 [Tom, Brad, George] with the new write [Tom, Brad, George, Angelina] received from user A and at the same time it merges the concurrent writes from user B received after version#3 of the queue state. Finally, the queue state at the server is [Tom, Brad, George, Angelina, Johnny] and the version is 5, which are also sent back to user A.

We can observe from the above events that the server attempts to distinguish between overwrittes and concurrent writes based on version numbers of the queue’s state. The client has the onus of merging the contents it receives from its immediately preceding write with the new write the client intends to make and ship the result to the server. The server can safely replace a queue state with the same or a higher versioned queue state because the client must have sent the merged results back. The contents from the higher versioned queue states are merged with the lower versioned queue state as the contents in the higher versioned queue state represent concurrent writes with respect to the contents of the lower versioned queue state. As a concrete example, consider when user B sends an update to the server in step#4. The update consists of the new write Johnny, and the previous state of the queue which already contains the merged concurrent writes from user A and user B i.e. [Tom, Brad]. The server receives [Tom, Brad, Johnny] with version 2. The server holds a higher versioned queue state numbered 3, which implies that there are concurrent writes with respect to what the server just received from user B and attempts to resolve the conflict by taking the union of values. Also, note that in case if user B were the only user then each time the server received a request from user B, the server would overwrite the existing version and would not need to increment the version of the queue’s state.

The above example treats the user as the key and the state of the user’s queue as the value of the key. We could have used any other key/value pair for the example. The concurrent writes are resolved by taking a union of the contents, in some systems such as Riak the concurrent writes are referred to as siblings. Taking a union of siblings works well for some scenarios but may not be suitable in others e.g. when movies can also be removed by a user. A user adding and another user removing the same movie from the queue as concurrent writes will lead to a queue state with the movie present when the union of siblings is applied as a conflict resolution mechanism.

Version vectors

The scheme we described above assigns a version per key. When multiple replicas are involved, a version is assigned per key per replica. Each replica maintains the version number for a given key locally that is incremented when the replica processes a write to the key. Additionally, each replica also tracks the version numbers for a given key from other replicas. All the versions for a key across all the replicas is referred to as a version vector and is sent to the client upon a read of the key.