What is the difference between data mining and data analysis?
Difference: 1. The conclusion drawn by "data analysis" is the result of human intellectual activities, while the conclusion drawn by "data mining" is the knowledge discovered by the machine from the learning set [or training set, sample set] Rules; 2. "Data analysis" cannot establish a mathematical model and requires manual modeling, while "data mining" directly completes mathematical modeling.
#The operating environment of this article: Windows 7 system, Dell G3 computer.
What is the difference between data mining and data analysis?
Data mining is to find hidden rules from massive data. Data analysis generally has a clear goal.
The main difference between data mining and data analysis
1. The focus of "data analysis" is to observe data, while the focus of "data mining" is to discover from data "Knowledge Rules" KDD (Knowledge Discover in Database).
2. The conclusions drawn by "data analysis" are the results of human intellectual activities, while the conclusions drawn by "data mining" are the knowledge rules discovered by the machine from the learning set (or training set, sample set).
3. The application of "data analysis" to draw conclusions is human intellectual activity, while the knowledge rules discovered by "data mining" can be directly applied to predictions.
4. "Data analysis" cannot establish a mathematical model and requires manual modeling, while "data mining" directly completes mathematical modeling. For example, the essence of traditional cybernetic modeling is to describe the functional relationship between input variables and output variables. "Data mining" can automatically establish the functional relationship between input and output through machine learning. According to the "rules" derived from KDD, given A set of input parameters can produce a set of output quantities.
A simple example:
There are some people who always fail to pay money to telecom operators in time. How to discover them?
Data analysis: Through observation of the data, we found that 82% of the poor people who did not pay money in time accounted for 82%. So the conclusion is that people with low incomes tend to pay late. The conclusion is that tariffs need to be reduced.
Data mining: Discover the deep-seated reasons by yourself through written algorithms. The reason may be that people who live outside the Fifth Ring Road do not pay in time due to the remote environment. The conclusion is that more business halls or self-service payment points need to be set up.
If you want to read more related articles, please visit PHP Chinese website! !
The above is the detailed content of What is the difference between data mining and data analysis?. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Pandas is a powerful data analysis tool that can easily read and process various types of data files. Among them, CSV files are one of the most common and commonly used data file formats. This article will introduce how to use Pandas to read CSV files and perform data analysis, and provide specific code examples. 1. Import the necessary libraries First, we need to import the Pandas library and other related libraries that may be needed, as shown below: importpandasaspd 2. Read the CSV file using Pan

Common data analysis methods: 1. Comparative analysis method; 2. Structural analysis method; 3. Cross analysis method; 4. Trend analysis method; 5. Cause and effect analysis method; 6. Association analysis method; 7. Cluster analysis method; 8 , Principal component analysis method; 9. Scatter analysis method; 10. Matrix analysis method. Detailed introduction: 1. Comparative analysis method: Comparative analysis of two or more data to find the differences and patterns; 2. Structural analysis method: A method of comparative analysis between each part of the whole and the whole. ; 3. Cross analysis method, etc.

Following the last inventory of "11 Basic Charts Data Scientists Use 95% of the Time", today we will bring you 11 basic distributions that data scientists use 95% of the time. Mastering these distributions helps us understand the nature of the data more deeply and make more accurate inferences and predictions during data analysis and decision-making. 1. Normal Distribution Normal Distribution, also known as Gaussian Distribution, is a continuous probability distribution. It has a symmetrical bell-shaped curve with the mean (μ) as the center and the standard deviation (σ) as the width. The normal distribution has important application value in many fields such as statistics, probability theory, and engineering.

In today's intelligent society, machine learning and data analysis are indispensable tools that can help people better understand and utilize large amounts of data. In these fields, Go language has also become a programming language that has attracted much attention. Its speed and efficiency make it the choice of many programmers. This article introduces how to use Go language for machine learning and data analysis. 1. The ecosystem of machine learning Go language is not as rich as Python and R. However, as more and more people start to use it, some machine learning libraries and frameworks

Visualization is a powerful tool for communicating complex data patterns and relationships in an intuitive and understandable way. They play a vital role in data analysis, providing insights that are often difficult to discern from raw data or traditional numerical representations. Visualization is crucial for understanding complex data patterns and relationships, and we will introduce the 11 most important and must-know charts that help reveal the information in the data and make complex data more understandable and meaningful. 1. KSPlotKSPlot is used to evaluate distribution differences. The core idea is to measure the maximum distance between the cumulative distribution functions (CDF) of two distributions. The smaller the maximum distance, the more likely they belong to the same distribution. Therefore, it is mainly interpreted as a "system" for determining distribution differences.

How to use ECharts and PHP interfaces to implement data analysis and prediction of statistical charts. Data analysis and prediction play an important role in various fields. They can help us understand the trends and patterns of data and provide references for future decisions. ECharts is an open source data visualization library that provides rich and flexible chart components that can dynamically load and process data by using the PHP interface. This article will introduce the implementation method of statistical chart data analysis and prediction based on ECharts and php interface, and provide

1. In this lesson, we will explain integrated Excel data analysis. We will complete it through a case. Open the course material and click on cell E2 to enter the formula. 2. We then select cell E53 to calculate all the following data. 3. Then we click on cell F2, and then we enter the formula to calculate it. Similarly, dragging down can calculate the value we want. 4. We select cell G2, click the Data tab, click Data Validation, select and confirm. 5. Let’s use the same method to automatically fill in the cells below that need to be calculated. 6. Next, we calculate the actual wages and select cell H2 to enter the formula. 7. Then we click on the value drop-down menu to click on other numbers.

Recommended: 1. Business Data Analysis Forum; 2. National People’s Congress Economic Forum - Econometrics and Statistics Area; 3. China Statistics Forum; 4. Data Mining Learning and Exchange Forum; 5. Data Analysis Forum; 6. Website Data Analysis; 7. Data analysis; 8. Data Mining Research Institute; 9. S-PLUS, R Statistics Forum.