This article mainly talks about the design of data warehouse logical architecture, which has certain learning value. Interested friends can learn about it.
Offline data warehouses are usually built based on dimensional modeling theory. Offline data warehouses are usually logically layered, mainly for the following considerations:
1. Isolation:Users should use data that has been carefully processed by the data team, rather than raw data from the business system. The first advantage of this is that users use data that is carefully prepared, standardized, and clean from a business perspective. The data. Very easy to understand and use. Second, if the upstream business system is changed or even reconstructed (such as table structure, fields, business meaning, etc.), the data team will be responsible for handling all these changes and minimizing the impact on downstream users.
2. Performance and maintainability: Professional people do professional things. Data layering makes the data processing basically in the data team, so that the same business logic does not need to be executed repeatedly. , saving corresponding storage and computing overhead. In addition, data layering also makes the maintenance of the data warehouse clear and convenient. Each layer is only responsible for its own tasks. If there is a problem with data processing on a certain layer, you only need to modify that layer.
3. Standardization: For a company and organization, the caliber of data is very important. When everyone talks about an indicator, it must be based on a clear and recognized caliber. In addition Tables, fields, and metrics must be standardized.
4. ODS layer: The data table of the data warehouse source system is usually stored intact. This is called the ODS (Operation Data Store) layer, and the ODS layer often also Known as the staging area, they are the source of processing data for the subsequent data warehouse layer (i.e., the fact table and dimension table layer generated based on Kimball dimensional modeling, and the summary layer data processed based on these fact tables and detail tables). At the same time, the ODS layer also stores historical incremental data or full data.
5. DWD and DWS layers: The data warehouse detail layer (Data Warehouse Detail, DWD) and the data warehouse summary layer (Data Warehouse Summary, DWS) are the subject matter of the data warehouse. The data of the DWD and DWS layers are generated by the ODS layer after ETL cleaning, conversion, and loading, and they are usually built based on Kimball's dimensional modeling theory, and the dimensions of each subtopic are guaranteed through consistent dimensions and data buses. consistency.
6. Application layer (ADS): The application layer is mainly the data mart (Data Mart, DM) established by each business or department based on DWD and DWS. The data mart DM is Relative to the data warehouse (Data Warehouse, DW) of DWD and DWS. Generally speaking, the data of the application layer comes from the DW layer, but in principle, direct access to the ODS layer is not allowed. In addition, compared with the DW layer, the application layer only contains detailed and summary layer data that departments or parties themselves care about.
If you want to know more technical tutorials, please be sure to pay attention to PHP Chinese website!
The above is the detailed content of A brief discussion on data warehouse technology. For more information, please follow other related articles on the PHP Chinese website!