Basic technologies include: 1. Data collection. There are four main sources of data collection, namely management information system, Web information system, physical information system, and scientific experiment system. 2. Data access. 3. Infrastructure, such as cloud storage, distributed file storage, etc. 4. Data processing: collect, organize, clean, and convert data from different data sets to generate a new data set. 5. Statistical analysis. 6. Data mining. 7. Model prediction, such as predictive models, machine learning, modeling and simulation. 8. Results presentation, such as cloud computing, tag cloud, relationship diagram, etc.
#The operating environment of this article: Windows 7 system, Dell G3 computer.
Basic technologies of big data include data collection, data access, infrastructure, data processing, statistical analysis, data mining, model prediction, and result presentation.
#1. Data collection: In the life cycle of big data, data collection is the first step. According to the classification of application systems that generate data from MapReduce, there are four main sources of big data collection: management information systems, Web information systems, physical information systems, and scientific experiment systems.
2. Data access: Big data access adopts different technical routes and can be roughly divided into three categories. Category 1 mainly faces large-scale structured data. Category 2 mainly faces semi-structured and unstructured data. Category 3 faces a mixture of structured and unstructured big data,
3. Infrastructure: cloud storage, distributed file storage, etc.
4. Data processing: For different collected data sets, there may be different structures and patterns, such as files, XML trees, relational tables, etc., which is reflected in the heterogeneity of the data. For multiple heterogeneous data sets, further integration processing or integration processing is required. After collecting, sorting, cleaning, and converting data from different data sets, they are generated into a new data set to provide unification for subsequent query and analysis processing. data view.
5. Statistical analysis: hypothesis testing, significance testing, difference analysis, correlation analysis, T test, analysis of variance, chi-square analysis, partial correlation analysis, distance analysis, regression analysis, simple regression analysis, multiple regression Analysis, stepwise regression, regression prediction and residual analysis, ridge regression, logistic regression analysis, curve estimation, factor analysis, cluster analysis, principal component analysis, factor analysis, fast clustering method and clustering method, discriminant analysis, correspondence analysis , multivariate correspondence analysis (optimal scale analysis), bootstrap technology, etc.
6. Data mining: At present, it is still necessary to improve existing data mining and machine learning technologies; develop new data mining technologies such as data network mining, special group mining, and graph mining; break through object-based data connections, Big data fusion technologies such as similarity connection; breakthroughs in field-oriented big data mining technologies such as user interest analysis, network behavior analysis, and emotional semantic analysis.
7. Model prediction: prediction model, machine learning, modeling and simulation.
8. Result presentation: cloud computing, tag cloud, relationship diagram, etc.
The above is the detailed content of What are the basic technologies of big data?. For more information, please follow other related articles on the PHP Chinese website!