Keeping a copy of the same data on multiple machines connected by a network is known as “Replication.
There are several reasons why data must be replicated. Some include:
Replication becomes tricky when the replicated data can change over time. If the data were constant, we could make a copy of the data on every machine and be done. However, any meaningful distributed system sees data change over time and making sure that the replicated copies of data look the same on all the machines becomes challenging. At this point, we’ll assume that each copy of the data we are trying to replicate can fit neatly on a single machine. We’ll explore the case of data that can’t fit onto a single machine later.
There are three primary algorithms used to replicate data, each with its own pros and cons.
Additionally, replication can be synchronous or asynchronous and usually systems such as databases have knobs and levers to tweak replication behavior.