Some of the problems with multi-leader replication include:
One of the major problems of multileader replication is conflicting writes. Suppose you are editing a Google document simultaneously with a colleague sitting across the globe. One of you changes the first heading in the document from A to B and the other changes the same heading from A to C respectively. The changes by the two users are committed at the local leader in the nearest data center to each of the users. Unbeknownst to the parties, a conflicting write has been recorded for the document. Also, it can’t be said if the change from A to B is correct or the change from A to C. Both are equally valid. The conflicting write only gets detected at a later time when the write is asynchronously copied between datacenters and too late to ask either of the users to resolve the conflict.
The single leader architecture doesn’t suffer from conflicting writes, since the writes are serialized and the conflicting second write can be blocked or aborted and the user forced to retry. We could potentially fix conflicting writes in a multi-leader environment by detecting a conflict synchronously. A write request is replicated to all the leaders and followers before acknowledging success. However, doing so will essentially prevent the leaders in a multileader architecture from accepting writes independently of other leaders in the system.
A reasonable strategy is to avoid conflicts altogether in multi-leader architectures. Consider a record for a user’s profile on TikTok. The system can enforce updates to a given record to always route to the same leader or datacenter. This essentially serializes the writes to the record at a single leader. These writes are asynchronously replicated to other datacenters. Thus, the issue of conflicts doesn’t arise. However, there are caveats to plans such as:
In systems with multi-leader architectures, we eventually want the data copies at each leader to converge to the same snapshot of data. If a record is changed from A to B at leader#1 and from A to C at leader#2 around the same time, we have a couple of ways to resolve the conflicting writes at the two leaders:
Automatic resolution of conflicting writes is a complex and broad topic but we’ll briefly discuss some aspects of it here. There are data structures such as lists, maps, etc that can be edited by multiple users concurrently and have the ability to intelligently resolve conflicts automatically. These CRDT (conflict-free replicated datatypes) use sensible rules to auto-resolve conflicts. Similarly, there exist the mergeable persistent data structures, which track history of changes like a version control system (e.g. Git) and have the ability to merge conflicts. Collaborative editing products such as Google docs or Apache Wave use operational transformational technology for resolving conflicts in multi user scenarios.