The key to data use in the era of big data is data reuse. Big data refers to a collection of data that cannot be captured, managed, and processed within a certain time frame using conventional software tools. Big data is characterized by large volume, high speed, diversity, low value density and authenticity.
#The operating environment of this tutorial: Windows 10 system, Dell G3 computer.
Detailed introduction:
Big data (big data), an IT industry term, refers to a collection of data that cannot be captured, managed and processed within a certain time range using conventional software tools. It is a need The new processing model can produce massive, high-growth and diversified information assets with stronger decision-making power, insight discovery and process optimization capabilities.
In "The Age of Big Data" [1] written by Victor Meyer-Schonberg and Kenneth Cukier, big data refers to the use of shortcuts such as random analysis (sampling survey) instead of All data are analyzed and processed. The 5V characteristics of big data (proposed by IBM): Volume, Velocity, Variety, Value, and Veracity.
Features:
Volume: The size of the data determines the value and potential information of the data considered;
Variety: the diversity of data types ;
Velocity: refers to the speed at which data is obtained;
Variability: hinders the process of processing and effectively managing data.
Veracity: The quality of data.
Complexity: The amount of data is huge and comes from multiple channels.
Value (value): Rational use of big data to create high value at low cost.
Related expansion:
Gartner, a research organization for “big data”, gave this definition. "Big data" requires new processing models to have stronger decision-making power, insight discovery and process optimization capabilities to adapt to the massive, high growth rate and diversified information assets.
The definition given by McKinsey Global Institute is: a data collection that is so large that its acquisition, storage, management, and analysis greatly exceed the capabilities of traditional database software tools. It has massive data scale, rapid It has four major characteristics: data flow, diverse data types and low value density.
The strategic significance of big data technology lies not in mastering huge data information, but in professional processing of these meaningful data. In other words, if big data is compared to an industry, then the key to making this industry profitable is to improve the "processing capabilities" of data and achieve the "value-added" of data through "processing".
Technically, the relationship between big data and cloud computing is as inseparable as the two sides of the same coin. Big data cannot be processed by a single computer and must use a distributed architecture. Its characteristic lies in distributed data mining of massive data. But it must rely on distributed processing, distributed database and cloud storage, and virtualization technology of cloud computing.
With the advent of the cloud era, big data (Big data) has also attracted more and more attention. The analyst team believes that big data is generally used to describe the large amounts of unstructured and semi-structured data created by a company, which would take too much time and money to download to a relational database for analysis. Big data analytics is often associated with cloud computing because real-time analysis of large data sets requires frameworks like MapReduce to distribute work to tens, hundreds, or even thousands of computers.
Big data requires special techniques to efficiently handle large amounts of data over a tolerable amount of time. Technologies applicable to big data include massively parallel processing (MPP) databases, data mining, distributed file systems, distributed databases, cloud computing platforms, the Internet, and scalable storage systems.
The smallest basic unit is bit, all units are given in order: bit, Byte, KB, MB, GB, TB, PB, EB, ZB, YB, BB, NB, DB.
They are calculated according to the rate of 1024 (2 to the tenth power):
1 Byte =8 bit
1 KB = 1,024 Bytes = 8192 bit
1 MB = 1,024 KB = 1,048,576 Bytes
1 GB = 1,024 MB = 1,048,576 KB
1 TB = 1,024 GB = 1,048,576 MB
1 PB = 1,024 TB = 1,048,576 GB
1 EB = 1,024 PB = 1,048,576 TB
1 ZB = 1,024 EB = 1,048,576 PB
1 YB = 1,024 ZB = 1,048,576 EB
1 BB = 1,024 YB = 1,048,576 ZB
1 NB = 1,024 BB = 1,048,576 YB
1 DB = 1,024 NB = 1,048,576 BB
(Learning video sharing: Programming video)
The above is the detailed content of What is the key to data usage in the era of big data?. For more information, please follow other related articles on the PHP Chinese website!