The difference between data warehouse and database: 1. The database stores raw data without any processing; while the data warehouse is designed to meet the needs of data analysis, and the ETL process and data extraction work are performed on the source data. It is divided into extraction, cleaning, conversion and loading; 2. The amount of data in the data warehouse is much larger than that of the database.
(Recommended tutorial: mysql video tutorial)
1. Data warehouse
- What is a data warehouse?
Data Warehouse, which can be abbreviated as DW or DWH, data warehouse is a strategic collection of all types of data types for the decision-making and planning process at all levels of the enterprise. It was created for analytical reporting and decision support purposes. For enterprises that need business intelligence, they need to guide business process improvement, monitor time, cost, quality and control, etc.;
-
What can the data warehouse do? (Cite a few chestnuts)
- The establishment of annual sales targets requires decisions based on past historical reports and cannot be set casually.
- Optimize business process
For example: For a certain brand of mobile phone on a certain e-commerce platform, what is the age group of the main buyers in the past five years? In that season, the number of purchases was large, so that you can Based on this feature, set their main needs for the target group and dynamically allocate the generated production volume and warehouse inventory.
-
Characteristics of data warehouse
- The data warehouse is subject-oriented.
- Unlike traditional databases, data warehouses are subject-oriented. So what is a subject? The home page theme is a higher-order concept and is the object of data synthesis, classification and analysis in the enterprise information system at a higher level. In a logical sense, it is the analysis object involved in a certain macro analysis field in the enterprise. (In human terms: It is the key aspects that users are concerned about when using data warehouses to make decisions. A topic is usually related to multiple operational information systems, and the data organization of operational databases is oriented towards transaction processing tasks, and each task is isolated from each other. );
- The data warehouse is integrated.
- The data in the data warehouse is extracted from the original scattered database data (mysql and other relational databases). There is a big difference between operational databases and DSS (decision support system) analytical databases. First, the source data corresponding to each topic in the data warehouse has many repetitions and differences in all the scattered databases, and the data from different online systems are bundled with different application logic. ; Second, the comprehensive data in the data warehouse cannot be obtained directly from the original database system. Therefore, before the data enters the data warehouse, it must go through unification and synthesis. This step is the most critical and complex step in the construction of the data warehouse. The work to be done is:
- To count the sources All contradictions in the data, such as fields with the same name, different names and synonyms, inconsistent units, inconsistent word lengths, etc.
- Perform data synthesis and calculation. The data synthesis work in the data warehouse can be generated when the original database extracts data, but many of them are generated within the data warehouse, that is, after entering the data warehouse, they are synthesized and generated.
- #The data in the data warehouse changes with time.
- The data in the data warehouse cannot be updated for applications. That is to say, users of the data warehouse do not perform data update operations when performing analysis and processing. But it does not mean that all data warehouse data will never change during the entire life cycle from the beginning of data integration to the data warehouse until it is finally deleted.
- The data in the data warehouse changes with time, which is one of the characteristics of the data warehouse. This feature mainly has the following three manifestations:
- The data warehouse continues to add new data content as time changes. The data warehouse system must continuously capture the changing data in the OLTP database and append it to the data warehouse. That is to say, it must continuously generate snapshots of the OLTP database and add them to the data warehouse through unified integration; but for database snapshots that do not change, If new changed data is captured, only a new database snapshot will be generated and added without modifying the original database snapshot.
- The database continuously deletes old data content as time changes. The data in the data warehouse also has a storage period. Once this period expires, the expired data will be deleted. It's just that the data age limit in the database is much longer than the data age limit in the operational environment. In an operational environment, only 60 to 90 days of data are generally saved, while in a data warehouse, a longer period of data (for example, 5 to 10 years) needs to be saved to meet the requirements of DSS for trend analysis.
- The data warehouse contains a large amount of comprehensive data, and many of these comprehensive data are related to time. For example, the data is often synthesized according to time periods, or sampled at certain time slices, etc. These data need to be continuously re-synthesized as time changes. Therefore, the data characteristics of the data warehouse include time items to indicate the historical period of the data.
- #The data in the data warehouse cannot be modified.
- The data in the data warehouse is mainly used for enterprise decision-making analysis. The data operations involved are mainly data queries, and generally no modification operations are performed. The data in the data warehouse reflects the content of historical data over a long period of time. It is a collection of database snapshots at different points in time, as well as exported data based on statistics, synthesis and reorganization of these snapshots, rather than data processed online. The libraries for online processing in the database are integrated and input into the data warehouse. Once the data stored in the data warehouse has exceeded the data storage period of the data warehouse, the data will be deleted from the current data warehouse. Because the data warehouse only performs data query operations, the system in the data warehouse is much simpler than the system in the database. Many technical difficulties in database management systems, such as integrity protection, concurrency control, etc., can almost be eliminated in data warehouse management. However, since the amount of data queried in the data warehouse is often large, higher requirements are put forward for data query, which requires the use of various complex indexing technologies; at the same time, the data warehouse is oriented to the senior management of commercial enterprises, who will The interface friendliness and data representation of data query put forward higher requirements;
2. The difference between data warehouse and database
- Before we want to understand the difference, we need to understand three concepts. What are database software, databases and data warehouses?
- Database software: It is a kind of software (not a graphical client that links to the database). It is used to implement database logical processes and belongs to the physical layer.
- Database: It is a logical concept, a warehouse used to store data, implemented through database software. The database is composed of many tables. The tables are two-dimensional and there are many fields in one table. The fields are arranged in a row, and the data is written into the table row by row. Database tables are capable of expressing multidimensional relationships in two dimensions. Such as: oracle, DB2, MySQL, Sybase, MSSQL Server, etc.
- Data warehouse: It is an upgrade of the database concept. Logically speaking, there is no difference between a database and a data warehouse. They are both places where data is stored through database software. However, in terms of data volume, the data warehouse is much larger than the database. Data warehouse is mainly used for data mining and data analysis to assist leaders in making decisions;
- In the IT architecture system, a database must exist and there must be a place to store data. For example, current online shopping and other e-commerce. The inventory of items, the price of the items, the user’s account balance, etc. These data are stored in the background database. Or the simplest understanding is our current accounts and passwords such as WeChat, Weibo and QQ. The background database must be a user table with at least two fields, namely username and password, and then our data is stored in the table row by row. When we log in, we fill in the username and password, and the data will be sent back to the backend to match the data in the table. If the match is successful, we can log in. If the match is unsuccessful, an error will be reported. This is the database, and the database is used for work in the production environment. We use databases for all business-related applications.
- Data warehouse is one of the technologies under BI. Since the database is linked to business applications, it is impossible for one database to contain all the data of a company. The table design of the database is often designed for a certain application. For example, in the login function just now, there are only these two fields in the user table and no other fields. At that time, this table meets what it should be, and there is no problem, but this table does not meet the analysis. For example, I want to know during which time period there are the largest number of users? Which user purchases the most in a year? Indicators like this. Then we need to redesign the table structure of the database. For data analysis and data mining, we introduced the concept of data warehouse. The table structure of the data warehouse is designed according to the analysis requirements, analysis dimensions, and analysis indicators.
- The difference between database and data warehouse is actually the difference between OLTP and OLAP.
- Operational processing is called Online Transaction Processing (OLTP) (On-Line Transaction Processing), which can also be called a transaction-oriented processing system. It is a daily operation of a specific business online in the database, usually querying a small number of records. ,Revise. Users are more concerned about issues such as operation response time, data security, integrity, and the number of concurrent supported users. As the main means of data management, traditional database systems are mainly used for operational processing.
- Analytical processing, called online analytical processing OLAP (On-Line Analytical Processing), generally analyzes historical data on certain topics to support management decisions.
##Operational processing | Analytical processing |
Detailed | Comprehensive or refined |
Entity-relationship (E-R) model | Star model Or snowflake model |
Storing instantaneous data | Storing historical data, excluding recent data |
Updatable | Read only, append only |
Operation one unit at a time | Operation one collection at a time |
High performance requirements , short response time | Loose performance requirements |
Transaction-oriented | Analysis-oriented |
Operation data once Small amount | Support decision-making needs |
Small amount of data | Large amount of data |
Customer orders, Inventory levels and bank account inquiries | Customer revenue analysis, market segmentation |
3. Tail
1. If there are any errors, please feel free to comment. Point it out and I will correct it in time. If you don’t understand anything, you can also leave a message to ask questions and communicate with each other.
2. Maybe you think this is nothing, but I will take it seriously and regard it as my notes and experiences, so that I can improve myself.
The above is the detailed content of What is the difference between a data warehouse and a database?. For more information, please follow other related articles on the PHP Chinese website!