Table of Contents
1. Timestamp/query based
2. Trigger-based
3. Log-based
Home Computer Tutorials Computer Knowledge Change Data Capture: Overview, Why, and Best Practices

Change Data Capture: Overview, Why, and Best Practices

Feb 19, 2024 pm 03:42 PM
Architecture data cdc

Change Data Capture: Overview, Why, and Best Practices

Today’s businesses, especially those that prioritize digital transformation, are in dire need of real-time data. Traditional weekly and monthly batch processing can no longer meet demand. However, it is not easy to obtain real-time data from multiple sources and use it to automate processes and dynamically optimize decisions.

Recently, we encountered a challenge when re-architecting a customer's legacy system and splitting the monolithic architecture into microservices. We started making changes to the database and modernizing the system by module. At this stage, we need to ensure that both databases remain in sync, as different modules may require the same data - in other words, the old system requires data generated by the new system in the new database, and vice versa.

We researched Change Data Capture (CDC) technology to determine if it fit our needs. The article details the definition of CDC, the tools we tested, how they work and their advantages. At the same time, we shared some cases and suggestions to help other technicians choose the appropriate CDC tool in specific situations.

What is change data capture?

Data capture refers to the process of detecting and capturing changes in the source system and then delivering these changes to the target system in near real-time. These changes may include insert, delete, update operations, and DDL changes to the database structure.

How change data capture tools work

CDC tools implement their functions by monitoring data changes in the source system. Once a change is discovered, the CDC tool captures and records it in a designated location, such as a database or log file. The processed and transformed data is then loaded into a target system, such as a data warehouse or analytics platform.

There are many ways to capture database changes. Let’s take a look at some of them:

1. Timestamp/query based

In this method, we will maintain some audit columns similar to CREATED_AT, LAST_UPDATED or DATE_MODIFIED in the source and detect changes in these columns by querying the data in the source to capture any data changes . It should be noted that this method does not record deletion operations.

2. Trigger-based

A trigger is a function in the database that performs operations based on specific events. Although useful for capturing any change, including delete operations, it reduces database performance because each event requires multiple writes.

3. Log-based

The database contains a transaction log for recovery in the event of a crash, storing all events. With log-based CDC, new database transactions are read directly from the native log, which allows changes to be captured without scanning the source table and is therefore more efficient.

This approach is similar to event sourcing in event-driven architecture. Whenever the system state changes, we record it as an event. The recorded events can be replayed in the same order to reconstruct the system state at any time.

Why use CDC?

CDC is critical in many scenarios depending on the situation, application, architecture and business needs. Here are some of the ways the CDC helps with the engineering process:

  • Real-time data availability: CDC tools capture changes in near real-time, ensuring the latest data is available for analysis, reporting, or further processing.
  • Faster Decision Making: CDC helps reduce the delay between capture and data availability, enabling faster analysis and decision making.
  • Efficient data integration: CDC tools help capture data from multiple operational sources and convert it into a common format in a single target database or data lake.
  • Custom design of target database: CDC provides cross-functional benefits such as creating read-only search or query databases in CQRS systems, creating audit databases, or capturing data in data warehouses. It allows for decoupling non-functional and architectural requirements from the primary data store.
  • Simplified data migration: In our case, CDC helps maintain data consistency between legacy and new databases during the modernization phase. This applies to various other data migration scenarios as well.

How to choose the right CDC tool?

There are several CDC tools on the market, such as Oracle Golden Gate, Debezium, IBM Infosphere, Striim, StreamSets and Qlik Replicate. These tools can be open source or paid. They typically support on-premises and cloud environments and can handle a variety of data sources. When choosing, consider the following:

  • Compatibility with data sources: At a minimum, the tool you choose must be compatible with all data sources you want to capture changes to.
  • Real-time data capture: Tools should capture changes in near real-time so that you can work with the latest data.
  • Data conversion and integration: CDC tools should be able to handle data conversion from source to target data types.
  • Price: CDC tools must be cost-effective for your use case. There are open source, paid and licensed products available.
  • Ease of use and support: The tool should be easy to use for your team and provide adequate support, including comprehensive documentation and technical support.
  • Other features: Depending on your needs, you may also want to check out other specific features, such as two-way synchronization between source and destination and cloud support.

As businesses become technology-driven, historical and current data will become a critical differentiator. Achieving accurate, timely, efficient and cost-effective change data capture will be an important part of any technology transformation initiative. When you face this situation, I hope this article can help you.

The above is the detailed content of Change Data Capture: Overview, Why, and Best Practices. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Use ddrescue to recover data on Linux Use ddrescue to recover data on Linux Mar 20, 2024 pm 01:37 PM

DDREASE is a tool for recovering data from file or block devices such as hard drives, SSDs, RAM disks, CDs, DVDs and USB storage devices. It copies data from one block device to another, leaving corrupted data blocks behind and moving only good data blocks. ddreasue is a powerful recovery tool that is fully automated as it does not require any interference during recovery operations. Additionally, thanks to the ddasue map file, it can be stopped and resumed at any time. Other key features of DDREASE are as follows: It does not overwrite recovered data but fills the gaps in case of iterative recovery. However, it can be truncated if the tool is instructed to do so explicitly. Recover data from multiple files or blocks to a single

Open source! Beyond ZoeDepth! DepthFM: Fast and accurate monocular depth estimation! Open source! Beyond ZoeDepth! DepthFM: Fast and accurate monocular depth estimation! Apr 03, 2024 pm 12:04 PM

0.What does this article do? We propose DepthFM: a versatile and fast state-of-the-art generative monocular depth estimation model. In addition to traditional depth estimation tasks, DepthFM also demonstrates state-of-the-art capabilities in downstream tasks such as depth inpainting. DepthFM is efficient and can synthesize depth maps within a few inference steps. Let’s read about this work together ~ 1. Paper information title: DepthFM: FastMonocularDepthEstimationwithFlowMatching Author: MingGui, JohannesS.Fischer, UlrichPrestel, PingchuanMa, Dmytr

Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training Apr 01, 2024 pm 07:46 PM

The performance of JAX, promoted by Google, has surpassed that of Pytorch and TensorFlow in recent benchmark tests, ranking first in 7 indicators. And the test was not done on the TPU with the best JAX performance. Although among developers, Pytorch is still more popular than Tensorflow. But in the future, perhaps more large models will be trained and run based on the JAX platform. Models Recently, the Keras team benchmarked three backends (TensorFlow, JAX, PyTorch) with the native PyTorch implementation and Keras2 with TensorFlow. First, they select a set of mainstream

Slow Cellular Data Internet Speeds on iPhone: Fixes Slow Cellular Data Internet Speeds on iPhone: Fixes May 03, 2024 pm 09:01 PM

Facing lag, slow mobile data connection on iPhone? Typically, the strength of cellular internet on your phone depends on several factors such as region, cellular network type, roaming type, etc. There are some things you can do to get a faster, more reliable cellular Internet connection. Fix 1 – Force Restart iPhone Sometimes, force restarting your device just resets a lot of things, including the cellular connection. Step 1 – Just press the volume up key once and release. Next, press the Volume Down key and release it again. Step 2 – The next part of the process is to hold the button on the right side. Let the iPhone finish restarting. Enable cellular data and check network speed. Check again Fix 2 – Change data mode While 5G offers better network speeds, it works better when the signal is weaker

The vitality of super intelligence awakens! But with the arrival of self-updating AI, mothers no longer have to worry about data bottlenecks The vitality of super intelligence awakens! But with the arrival of self-updating AI, mothers no longer have to worry about data bottlenecks Apr 29, 2024 pm 06:55 PM

I cry to death. The world is madly building big models. The data on the Internet is not enough. It is not enough at all. The training model looks like "The Hunger Games", and AI researchers around the world are worrying about how to feed these data voracious eaters. This problem is particularly prominent in multi-modal tasks. At a time when nothing could be done, a start-up team from the Department of Renmin University of China used its own new model to become the first in China to make "model-generated data feed itself" a reality. Moreover, it is a two-pronged approach on the understanding side and the generation side. Both sides can generate high-quality, multi-modal new data and provide data feedback to the model itself. What is a model? Awaker 1.0, a large multi-modal model that just appeared on the Zhongguancun Forum. Who is the team? Sophon engine. Founded by Gao Yizhao, a doctoral student at Renmin University’s Hillhouse School of Artificial Intelligence.

Tesla robots work in factories, Musk: The degree of freedom of hands will reach 22 this year! Tesla robots work in factories, Musk: The degree of freedom of hands will reach 22 this year! May 06, 2024 pm 04:13 PM

The latest video of Tesla's robot Optimus is released, and it can already work in the factory. At normal speed, it sorts batteries (Tesla's 4680 batteries) like this: The official also released what it looks like at 20x speed - on a small "workstation", picking and picking and picking: This time it is released One of the highlights of the video is that Optimus completes this work in the factory, completely autonomously, without human intervention throughout the process. And from the perspective of Optimus, it can also pick up and place the crooked battery, focusing on automatic error correction: Regarding Optimus's hand, NVIDIA scientist Jim Fan gave a high evaluation: Optimus's hand is the world's five-fingered robot. One of the most dexterous. Its hands are not only tactile

The first robot to autonomously complete human tasks appears, with five fingers that are flexible and fast, and large models support virtual space training The first robot to autonomously complete human tasks appears, with five fingers that are flexible and fast, and large models support virtual space training Mar 11, 2024 pm 12:10 PM

This week, FigureAI, a robotics company invested by OpenAI, Microsoft, Bezos, and Nvidia, announced that it has received nearly $700 million in financing and plans to develop a humanoid robot that can walk independently within the next year. And Tesla’s Optimus Prime has repeatedly received good news. No one doubts that this year will be the year when humanoid robots explode. SanctuaryAI, a Canadian-based robotics company, recently released a new humanoid robot, Phoenix. Officials claim that it can complete many tasks autonomously at the same speed as humans. Pheonix, the world's first robot that can autonomously complete tasks at human speeds, can gently grab, move and elegantly place each object to its left and right sides. It can autonomously identify objects

Alibaba 7B multi-modal document understanding large model wins new SOTA Alibaba 7B multi-modal document understanding large model wins new SOTA Apr 02, 2024 am 11:31 AM

New SOTA for multimodal document understanding capabilities! Alibaba's mPLUG team released the latest open source work mPLUG-DocOwl1.5, which proposed a series of solutions to address the four major challenges of high-resolution image text recognition, general document structure understanding, instruction following, and introduction of external knowledge. Without further ado, let’s look at the effects first. One-click recognition and conversion of charts with complex structures into Markdown format: Charts of different styles are available: More detailed text recognition and positioning can also be easily handled: Detailed explanations of document understanding can also be given: You know, "Document Understanding" is currently An important scenario for the implementation of large language models. There are many products on the market to assist document reading. Some of them mainly use OCR systems for text recognition and cooperate with LLM for text processing.

See all articles