쿠팡에서는 CDC 솔루션을 알리바바 오픈소스인 canal을 사용한다고 한다.
https://github.com/alibaba/canal

하는 짓은 debezium과 비슷한데..  카프카 connect를 사용하지 않는다.


알리바바 클라우드에서 사용하고 있다. 
https://docs.google.com/presentation/d/1MkszUPYRDkfVPz9IqOT1LLT5d9tuwde_WC8GZvjaDRg/edit#slide=id.p8

Posted by '김용환'
,

https://debezium.io/docs/connectors/mysql/#when-things-go-wrong

<장애 요약>
- exactly once이지만 장애시에는 at least once가 될 수 있다. (우리 요구사항을 충족한다)
- 여러 컴포넌트 장애가 있어도 다 처리한다.

Posted by '김용환'
,

 

debezium connector 설정은 다음과 같다. 이 정보를 debezium connector에 올리면 DB에서 데이터를 덤프 및 CDC를 진행한다. 

{
 “name”: “google_debezium_connector_shopping_orders”,
 “config”: {
   “connector.class”: “io.debezium.connector.mysql.MySqlConnector”,
   “tasks.max”: “1”,
   “database.hostname”: “mysql.google.com”,
   “database.port”: “3306”,
   “database.user”: “debezium”,
   “database.password”: “google_debezium”,
   “database.server.id”: “18405”,
   “database.server.name”: “google-shopping”,
   “database.whitelist”: “shopping”,
   “table.whitelist”: “shopping.demo_orders”,
   “database.history.kafka.bootstrap.servers”: “kafka:29092”,
   “database.history.kafka.topic”: “schema_changes_shopping_orders”
 }
}

이때 어느 kafak의 어느 topic에 CDC 데이터가 저장될까??


kafka connector 에 debezium을 사용할 때
serverName(database.server.name).databaseName.tableName이 합쳐진 카프카 토픽에 value가 저장된다.
google-shopping.shopping.demo_orders

database.history.kafka.topic 값을 읽다가 알게 되었다. 사실 문서에 다 있다.

 

https://debezium.io/docs/connectors/mysql/#topic-names

Posted by '김용환'
,

debezium에서 사용하는 bin 로그 라이브러리는 shyiko이고 NIFI에서도 사용되는 검증된 라이브러리이다.

https://github.com/shyiko/mysql-binlog-connector-java

 

 

실제 Debezium에서 shyiko를 통해 bin 로그를 읽어오는 코드이다. 

https://github.com/debezium/debezium/blob/master/debezium-connector-mysql/src/main/java/io/debezium/connector/mysql/BinlogReader.java

Posted by '김용환'
,

https://www.confluent.io/blog/kafka-connect-deep-dive-jdbc-source-connector

 

Kafka Connect Deep Dive – JDBC Source Connector | Confluent

The JDBC source connector for Kafka Connect enables you to pull data (source) from a database into Apache Kafka®, and to push data (sink) from a Kafka topic to a database. Almost all relational databases provide a JDBC driver, including Oracle, Microsoft S

www.confluent.io

 

카프카 커넥트에 대한 설명이 있다. 

 

재미있는 부분은 아래와 같다.

증분 데이터를 DB에서 직접 읽을 수 있는 방법을 설명한다. 실제로 잘 동작하니 참고하면 좋다.

    • MySQL

      CREATE TABLE foo ( … UPDATE_TS TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP );
    • Postgres

      CREATE TABLE foo ( … UPDATE_TS TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- Courtesy of https://techblog.covermymeds.com/databases/on-update-timestamps-mysql-vs-postgres/ CREATE FUNCTION update_updated_at_column() RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN NEW.update_ts = NOW(); RETURN NEW; END; $$; CREATE TRIGGER t1_updated_at_modtime BEFORE UPDATE ON foo FOR EACH ROW EXECUTE PROCEDURE update_updated_at_column();
    • Oracle

      CREATE TABLE foo ( … CREATE_TS TIMESTAMP DEFAULT CURRENT_TIMESTAMP , ); CREATE OR REPLACE TRIGGER TRG_foo_UPD BEFORE INSERT OR UPDATE ON foo REFERENCING NEW AS NEW_ROW FOR EACH ROW BEGIN SELECT SYSDATE INTO :NEW_ROW.UPDATE_TS FROM DUAL; END; /
  •  

 

 

 

 

Posted by '김용환'
,

postgress--> debezium -> confluent platform 연동 사례
https://www.linkedin.com/pulse/change-data-capture-postgresql-via-debezium-part-1-paolo-scarpino/

불러오는 중입니다...

 

상용에 적용한 경우는 아니었다고 한다.

그러나 상용에 적용할 만 하다!!!

 

 

Posted by '김용환'
,

schema registry는 clustering 지원/mulit-idc 지원합니다.
https://docs.confluent.io/current/schema-registry/docs/multidc.html

 

https://docs.confluent.io/current/schema-registry/docs/multidc.html

 

docs.confluent.io

 

그리고 zk, kafka 만 있으면 된다. schema registry 간의 통신은 없다 .

Posted by '김용환'
,

kafka -avro 연동과 관련된 좋은 자료이다. 


https://blog.cloudera.com/blog/2018/07/robust-message-serialization-in-apache-kafka-using-apache-avro-part-1/
https://blog.cloudera.com/blog/2018/07/robust-message-serialization-in-apache-kafka-using-apache-avro-part-2/
https://blog.cloudera.com/blog/2018/08/robust-message-serialization-in-apache-kafka-using-apache-avro-part-3/ 

Posted by '김용환'
,

http://blog.christianposta.com/microservices/the-hardest-part-about-microservices-data/


https://www.slideshare.net/ceposta/the-hardest-part-of-microservices-your-data

 

https://youtu.be/MrV0DqTqpFU

 

 

-> consistenty에 대한 얘기를 나누며

debezium을 설명한다. 

Posted by '김용환'
,

https://wecode.wepay.com/posts/streaming-databases-in-realtime-with-mysql-debezium-kafka

 

Streaming databases in realtime with MySQL, Debezium, and Kafka

Principal Software Engineer Change data capture has been around for a while, but some recent developments in technology have given it new life. Notably, using Kafka as a backbone to stream your database data in realtime has become increasingly common. If y

wecode.wepay.com

 

발표 자료와 슬라이드

https://www.confluent.io/kafka-summit-sf17/database-streaming-at-wepay-with-kafka-debezium

불러오는 중입니다...

 

Posted by '김용환'
,