In case of master-slave replication the master is responsible for shipping changes received via write requests from clients to its followers. These changes are initially recorded on the master’s end in a log file. We’ll examine the various ways log files are implemented.
Relational databases such as MySQL or Oracle have associated database query languages such as SQL. These languages consist of statements that read, write or modify records within the database. One of the way to maintain a replication log on the master node is to simply record every statement e.g. INSERT, UPDATE or DELETE that is received from clients. Note, we don’t need to record statements that perform reads since they don’t mutate the database. The logged SQL statements can then be sent to the followers who execute these statements on the replica of data they hold to get in sync with the data on the master.
This replication may sound simple and effective but it comes with its own set of problems. Some of these are:
The write ahead log is an append-only file that records all the changes that take place on the master node. In the context of a database it is a sequence of bytes. Depending on the storage engine in a database, this log can be:
Outside the context of databases, WAL or a form of it is commonly used. For instance in the Hadoop distributed file system, the metadata about the files is maintained by an entity called the Namenode. The Namenode follows a master-slave model, where the master or active Namenode maintains an up to date metadata about files. As changes to the filesystem occur they are recorded in an edit log. The slave/passive Namenodes observes the changes showing up in the edit log and applies them locally thus keeping its data in-sync with that of the active Namenode. In case, the active Namenode crashes, the passive can take its place.
Another popular distributed system Kafka is based on WAL
Some of the issues this approach suffers from:
This method of replication is applicable in the context of databases. It allows for decoupling the storage format from the log in contrast to the WAL shipping method of replication. The log in this case is called the logical log to distinguish it from the log created in case of WAL shipping which is a representation of the physical data in the storage engine. Since the logical log is decoupled from the internals of the storage engine, it is easily backwards compatible and allows for the leader and follower to be on different versions of the database software. In fact the two parties can run different storage engines and still be able to work with each other.
The logical log captures the changes happening in the database at the row level, or in other words it is capturing the delta in the state of database caused by the execution of each statement at the master node. For instance:
This technique is also called change data capture. A benefit of using this method of replication is that the log can be exported to external systems which can easily parse logical data. These systems could be data warehouses, indexes or caches.
The previous replication methods were implemented at the database layer, however, there are use case where the application decides whether to replicate the data, replicate a portion of it only or to replicate the data to another system. For such scenarios, triggers and procedures - a commonly found feature in relational databases - can be used. Application specific code can be registered with triggers and the code is executed automatically whenever a change in the database occurs. Generally, the action involving copying the change to a separate table which an external process can read and then apply application specific logic.
Trigger based replication has more overhead compared to other replication mechanisms and may also be limited in its capabilities to only what triggers can do. Nevertheless, it is still an option that may be suitable for certain use cases.