Dealing Distributed Transactions with 2PC, 3PC, Local Transaction Table with MQs

In a monolithic system, transactions can be managed by the DB, e.g. lock, redo log, undo log, DB’s ACID characteristics, and etc. When it comes to distributed systems, multiple systems involved, the situation becomes more complicated compared to the scenario in a monolithic system.

I am going to start with the some theories and architecture design to solve Distributed Transaction requirements. When I have time in the future, I will write some posts about implementation details, high concurrent and low latency approaches.

This blog will be structured with the following sections:

  1. Key takeaway architecture design mentality
  2. Terminology
  3. 2PC—Two-Phase Commit
  4. 3PC—Three-Phase Commit
  5. Eventual Consistency with Message Queue, Schedule Tasks and Local Transaction Table

Key takeaway architecture design mindset

You can’t rollback a task once it is committed/consumed in a third party systems so writing to local DB before sending message to MQ is the key.

Terminology

TM = Transaction Manager, also called coordinator

RM = Resource Manager, also called cohort, you can take it as services such as payment service, order service

2PC — Two-Phase Commit

2PC is widely used in many organisations.

Only TM has timeout. TM timeout means after a timeout period the TM has receive response from a specific RM.

Work Flow

  1. TM sends commit requests, i.e. pre commit requests, to all involved RMs. During this phase, RMs that receive the requests will start the transaction but not commit it. Resource is locked once the transaction is started on the RMs.
    • It blocks other requests that require the access to the locked resource.
  2. It will have 2 scenarios on the second phases
    • If all RMs success on the previous phase, then TM sends commit instruction to all RMs;
    • If one or more of the RMs are failed on the first phase, no matter due to network failure, pre commit failure, TM timeout, or other errors, then TM will send out abort instructions to all RMs.

Pros

Better commit success rate compared to a service directly calling other parties. This is because the pre-commit phase already verify that it is able to commit before the TM sending out commit request. If the pre-commit success, the server and network are most likely in a good state.

Cons

  1. Single node failure. The entire transaction workflow is managed by the TM. If the TM down during the process, the transaction is broken.
  2. Blocking the distributed system. Resource is locked between phase one and phase two. If there are network issues or server issues, e.g. STW, it could block other requests to access the same resource for long.
  3. Data inconsistent. If in the 2 phase, the TM has sent commit request to RM1 and RM1 has committed, at the same time both TM and RM1 down before RM2 receive the commit request from the TM, then there will be data inconsistent because RM1 committed while RM2 hasn’t. Also the standby instance of TM might not has yet synchronised the transaction state from the Pre-Active TM.

3PC — Three-Phase Commit

3PC has relatively higher transaction success rate and lower resource lock time compared to 2PC.

In 3PC, both TM and RM have timeout.

Work Flow

Key changes

  1. In the 2nd phase, if the TM hasn’t got the response from certain RMs, then it will assume those RMs are failed to pre-commit. Then the TM will send out abort instructions to all RMs.
  2. In the 3rd phase, if RMs are timeout, then they will continue to commit.

Pros

  1. Higher Transaction success rate and lower resource lock time because the can-commit phase tests the network and servers’ state is in good condition.
  2. Both TM and RM have timeout mechanism. This reduces the probability of data inconsistent.

Cons

  1. It hasn’t solved all the issues of 2PC but rather just mitigate their severity.
  2. Data inconsistent. If the TM fail and some of the RMs has pre-committed while others are not, then there will be data inconsistent.

Compensation Mechanisms

Here are the typical ways of applying compensation to deal with the data inconsistent issues of 2PC and 3PC.

  1. Human intervention compensation
  2. Schedule tasks compensation
  3. Cron jobs compensation

Eventual Consistency with Message Queue, Schedule Tasks and Local Transaction Table

This approach is suitable for those system without high concurrent requests. It’s enough for many small to medium businesses or internal systems. Both the schedule service and MySQL are the bottlenecks of this system. Frequently read/write would place overloaded stress on MySQL DBs. Scaling is also an issue because here services are tightly coupled with the db.

Work Flow

Both phase 1 and 2 are local DB transaction execution.

You can’t rollback a task once it is committed/consumed in a third party systems so writing to local DB before sending message to MQ is the key.

Therefore, we first write to local DB before sending message to MQ.

  1. Anything fail before sending to the MQ won’t impact other services.
  2. If the MQ is down
    • If the MQ hasn’t sent out message to the order service, then it’s fine because when its standby instance take place, it will send out the message.
    • If the MQ has sent out the message to the order service, we will have mechanism described in the above diagram to ensure Idempotency so no need to worry about repeat consumption issues.

CAP—Simple explanation of Consistency, Availability and Partition tolerance

If you want a highly carrier-oriented husband, he most likely couldn’t pay much attention and time to his family. If you drive a car with complex high techs, high performance, and luxury, it most likely won’t come with reliability. There are many imperfections in many aspects of the real world. Like these imperfections, the CAP theorem is the theory of you can’t have all.

Despite many imperfections, it boils down to one question, what are the most important factors that we want the most.

I am going to explain the CAP theorem with simple language with real world examples.

The C — Consistency

The consistency means in the same software system having the same state of data in different storage spaces at the same time.

Once you commit it, it should have the same result when you query later no matter when and where you check it.

Take the ATM machine as an example, if you withdrew $1,000,000 from your account, once the transaction has done, no matter when and where you check it, there must be $1,000,000 amount deduction from your account balancer. There’s no tolerance for any delay of account balance synchronisation in this scenario, otherwise, you can earn $1,000,000 easily by having two people in different locations with the distance that is long enough for the system delay to withdraw $1,000,000 at the same time even your account doesn’t have $2,000,000.

The A — Availability

The availability means at the same time the service is available.

The availability refers to whether services are available, in the other word, whether a service can response in a reasonable manner with reasonable response.

If we initial a phone call to a friend, and the call is successfully initiated. However, the delay of our voice to the other party is 500 years, then the service is still unavailable. We would simply hang up the call.

The P — Partition Tolerance

The partition means in different storage spaces.

Partitions tolerance means the distributed system still provide services even part of the system is unavailable due to some issues such as network unavailability.

Take a news system as an example,

As can be seen from the above diagram, the Brisbane server is still able to provide service for Users with cached/stored relatively old data in its data source. This is an example of the AP model.

The Conflicts

CP(Consistency and Partition Tolerance)

To have same state of data in different storage spaces, it must be at the different time.

AP(Availability and Partition Tolerance)

At any time, to be able to get data from storage in any space, there must be instances that the data is inconsistent.

CA(Consistency and Availability)

At any time, to be able to get the consistent data from any storage space, then there must be only 1 storage space.

For the current human technology, it is impossible to achieve CAP at the same time.

Usage

Strong consistency scenarios: Bank transaction

To be strong consistency, we need synchronisation when update occur, and the performance and availability will severely impact.

High availability scenarios: articles, news websites.

For high availability scenarios, we can apply eventually consistency model. Users might get some old data for a certain period, but eventually they will have the new data.

Applications

Zookeeper(ZK)

ZK applies the CP model. At any time, we can have consistent data results. It has network partition tolerance but doesn’t guarantee the availability of services. For example, if the ZK cluster is busy with elections or half of the machines are unavailable in the cluster, then the service is unavailable.

Eureka

Eureka applies the AP model. It does not guarantee the consistency of data. Further detail please refer to my previous post: Spring Cloud Netflix Eureka for microservice registration and discovery