Outbox pattern and Change Detect Capture

In Microservices system architectures,  basically what we want to do is splitting the one big service into small modules. And by doing this, The overall system can be more elastic. But splitting one system into smaller modules leads to an inevitable communication between modules. 

I think when it comes to microservices, it's all about data and updating the data and sending the data to the other service.

1. Synchronous 
REST API is a common way to communicate with each other. Internally, well-defined API can fully achieve this task. But there are several problems. First, they need to know each other before the communication. Service Discovery provides a way of doing this but the problem still exists. And the module always should be prepared for the case of failure. And calling the HTTP method to another module will block the usage and in the end, the overall system latency will rise.

2. Asynchronous
We can take advantage of a message bus like Kafka. After persisting the database of the module, the module can publish the message of Kafka. Then, modules that are interested in this topic will consume the message. This also contains possible problems: inconsistency. What if one of database transaction or Kafka message publish fail? We cannot bundle up DB transactions and Kafka publishing together. 

3. Approach to the problem ( The issue of dual writes )
We can approach the way to solve this issue by just write the data either one of them. But imagine writing a message to Kafka and then consuming the message from where it published. This is the lack of "read your own write" semantics. 

4. Outbox pattern
So, here comes the outbox pattern. The main point is having another DB table called "outbox". If something changed, not only this information is applied to the business logic related database table, but also the change is written in the "outbox" table in the same transaction. And there is a standalone message relay process called "connector" that reads this information and publishes the message to the Kafka. 

5. CDC
In the case of MySQL, the connector can read the MySQL binlog and here CDC plays a roll. The message relay connector can detect the change of "outbox" table by reading the WAL(Write-ahead-log) binlog. Since the connector only interested in change, the "outbox" table can be written and deleted right away. So the size of the table is not a problem. The "outbox" table will always be empty, and the the connector will only process any events for "INSERT', ignore "DELETE".

6. The reason I need Cache
However, outbox pattern and CDC also can introduce another problem. And I'm going to discuss this in the next article.

요약
  • outbox 패턴은 micro services 간의 data propagation 에 매우 잘 어울린다
  • 각각 서비스가 가지는 database 들 간의 inconsistency 를 피할 수 있다
  • read your own write 를 할 수 있다
  • Kafka 가 데이터의 backbone 이 된다
  • 관련된 microservice 가 죽어도 다시 시작한 시점부터 읽어올 수 있다
  • 관련된 microservice 가 죽어도 살아있는 service  는 영향 받지 않을 수 있다
  • 관련된 microservice 의endpoint 에 대해 직접 몰라도 된다
  • duplicate consume 에 대한 확인 및 처리를 할 필요가 있다 (at least once publish 이므로)
  • eventual consistency 로, microservice 간 lag가 발생할 수는 있다(near realtime 이긴하지만)
  • outbox table 에 저장되는 내용만 Kafka 로 publish 된다 (모든 DB 테이블이 아니다!)
  • Kafka 로 publish 할 data structure 의 변경에 대해서는 조심해야한다
  • Kafka 로부터 consume 할 data structure 를 읽지 못하는 경우에 대해서도 조심해야한다

궁금한점
  • Debezium  은 그럼, binlog 로부터 읽어오는게 thread 가 하나인가? write 이 매우 빈번한 database 라면, 그 lag 가 길어지지 않을까? 사용자 입장에서 용납하기 힘든 수준이라면?
  • 여러대의 인스턴스에서  DB 에 엄청 쏘는 경우라면?
  • Outbox 패턴에서는, 그럼 Kafka consuming 이  REST API  endpoint 로 역할하는것과 비슷하다. 그럼 이 Kafka consuming 의 thread 모델은 어떻게 되는가?

Comments

Popular posts from this blog

삼성전자 무선사업부 퇴사 후기

개발자 커리어로 해외 취업, 독일 이직 프로세스

코드리뷰에 대하여