System Design Notes
Don’t forget to get your copy of Designing Data Intensive Applications the single most important book to read for system design interview prep!

Multileader Topologies

Multileader architectures come in different topologies. A replication topology refers to how nodes in a system are set-up to exchange local writes, i.e. writes at a given leader, with other leader nodes. The case of two nodes is trivial as the information must be exchanged bilaterally, however, with three or more nodes the number of possible ways nodes can exchange information among themselves is numerous. We’ll discuss the various topologies in which nodes can be arranged to exchange writes with each other.

All to all

In all to all topology a leader sends its writes to every other leader. The pro of this approach is that all leader nodes receive writes occurring at every other leader and the failure of any individual leader doesn't affect propagation of writes amongst the remaining leaders in the system. The con of this approach is that a node has to send its local writes to every other leader, thereby increasing the amount of data that traverses the network. However, the bigger problem with this approach arises when writes by a single client can be accepted by different leaders. Consider the following sequence of events:

  1. Client initiates a write request W1 that is routed to leader A.
  2. Client initiates a second write request W2 that is accepted by leader B.
  3. Leader B sends W2 to leader C for replication before leader A is able to send W1 to leader C.
  4. Leader A sends W1 to leader C after leader C has already processed W2 received from leader B.

The above scenario causes the write requests to be recorded in the reverse order at leader C, which can be a problem if the two writes update the same record. There are potential solutions such as routing writes for a particular record to the same leader or using a technique called version vectors. Very briefly, version vector is a mechanism for tracking changes to data in a distributed system, where multiple clients might update the data at different times. The version vector allows the participants to determine if one update preceded another (happened-before), followed it, or if the two updates happened concurrently (and therefore might conflict with each other). In this way, version vectors enable causality tracking among data replicas.

Circular

Leader nodes in a distributed system can be arranged in a circular topology as depicted below:

In this setup, a node exchanges information with the node adjacent to it in clockwise (or anti-clockwise) direction. Each node passes along its local writes and all the writes it received from its predecessor node in the clockwise (or anticlockwise) direction to the next node in order. To avoid infinite looping, each write is recorded in the replication log and forwarded with the unique identifiers of the nodes that have already seen the write. When a node receives a write that has the unique identifier for the receiving node, the write is ignored since it has already been processed.

One of the cons the circular topology suffers from is that the failure of a single node in the topology can halt the flow of replication messages in the system. The leader nodes can be rearranged to circumvent the failed node but require manual intervention. In comparison the all to all topology is more resilient to failures as multiple paths exist for replication messages to be routed between two nodes.

Star

The star topology has nodes arranged in the shape of a star and a single root node forwards replication messages to all other leader nodes. Similar to the circular topology, the star topology can suffer from interruptions in the flow of replication messages when nodes in a path go down.