The core of big data is prediction. The essence of big data is to solve problems, and the core value of big data is prediction; big data is to apply mathematical algorithms to massive data to predict the possibility of things happening; big data prediction is to predict the future based on big data and prediction models. the probability of something happening.
The operating environment of this tutorial: Windows 7 system, Dell G3 computer.
The core of big data is prediction. It is often considered a part of artificial intelligence, or rather, a type of machine learning. But this definition is misleading. Big data is not about teaching machines to think like humans.
On the contrary, it applies mathematical algorithms to massive amounts of data to predict the likelihood of something happening. The probability that an email will be filtered out as spam, the probability that the entered "teh" should be "the", the probability that a person can cross the road in time based on the trajectory and speed of a person jaywalking , are all within the range that big data can predict. Of course, if a person can cross the street in time, then the car will only need to slow down slightly when he or she jaywalks. The key to the success of these forecasting systems is that they are based on massive amounts of data. Furthermore, as systems receive more and more data, they can become smart enough to automatically search for the best signals and patterns and improve themselves.
Big data prediction (big data core application)
Big data prediction is the core application of big data, which combines traditional prediction Expand to "live testing". The advantage of big data prediction is that it transforms a very difficult prediction problem into a relatively simple description problem, which is simply beyond the reach of traditional small data sets. From the perspective of prediction, the results obtained by big data prediction are not only simple and objective conclusions used to deal with real business, but also can be used to help business operations decisions.
1. Prediction is the core value of big data
The essence of big data is to solve problems. The core value of big data lies in prediction, and the core of business operations is also Make correct judgments based on predictions. When talking about big data applications, the most common application cases are "predicting the stock market", "predicting the flu", "predicting consumer behavior", etc.
Big data prediction is based on big data and prediction models to predict the probability of something in the future. The biggest difference between big data and traditional data analysis is to shift analysis from "facing the past that has happened" to "facing the future that is about to happen".
The logical basis of big data prediction is that every unconventional change must have signs beforehand, and everything has traces to follow. If you find the pattern between signs and changes, you can proceed predict. Big data prediction cannot determine that something will definitely happen. It is more about giving the probability that an event will happen.
The continuous repetition of experiments and the increasing accumulation of big data allow humans to continuously discover various patterns and thus be able to predict the future. Using big data to predict possible disasters, using big data to analyze the possible causes of cancer and find treatments are all undertakings that can benefit mankind in the future.
For example, big data has been used by the Los Angeles Police Department and the University of California to predict the occurrence of crime; Google Flu Trends uses search keywords to predict the spread of bird flu; MIT uses mobile phone location data and traffic data Carry out urban planning; the Meteorological Bureau collates recent meteorological conditions and satellite cloud images to more accurately judge future weather conditions.
2. Changes in thinking about big data prediction
In the past, people’s decision-making mainly relied on 20% of structured data, while big data prediction can use another 80% of unstructured data is used to make decisions. Big data prediction has more data dimensions, faster data frequency and wider data width. Compared with the era of small data, the thinking of big data prediction has three major changes: real samples instead of sampling; prediction efficiency instead of accuracy; correlation instead of causation.
1) Real samples instead of sampling
In the era of small data, due to the lack of means to obtain all samples, people invented the method of "random survey data". In theory, the more randomly a sample is drawn, the more representative it is of the overall sample. But the problem is that obtaining a random sample is extremely expensive and time-consuming. Population surveys are a typical example. It is difficult for a country to complete a population survey every year because random surveys are too time-consuming and labor-intensive. However, the emergence of cloud computing and big data technology makes it possible to obtain large enough sample data and even the entire population. Data becomes possible.
2) Efficiency rather than accuracy
In the small data era, due to the use of sampling methods, it is necessary to be very precise in the specific operations of data samples, otherwise it will be "a slight difference, a thousand miles missed" ". For example, if 1,000 people are randomly selected for a census from a total sample of 100 million people, if there is an error in the calculation on 1,000 people, the deviation will be very large when it is enlarged to 100 million people. But in the case of the full sample, there is as much deviation as there is, and it will not be amplified.
In the era of big data, quickly obtaining a rough outline and development context is much more important than strict accuracy. Sometimes, when we have large amounts of new types of data, accuracy is less important because we can still get a handle on how things are going. Simple algorithms based on big data are more effective than complex algorithms based on small data. The purpose of data analysis is not just data analysis, but for decision-making, so timeliness is also very important.
3) Correlation rather than causation
Big data research is different from traditional logical reasoning research. It requires statistical search, comparison, clustering, and classification of huge amounts of data. Analyze and summarize, and pay attention to the correlation or correlation of the data. Correlation means that there is some regularity between the values of two or more variables. There are no absolutes in correlation, only possibilities. However, if the correlation is strong, the probability of a correlation being successful is very high.
Correlation can help us capture the present and predict the future. If A and B often occur together, then we only need to note that B occurs to predict that A will also occur.
According to correlation, our understanding of the world no longer needs to be based on assumptions. This assumption refers to the assumption established for a phenomenon about its production mechanism and internal mechanism. So we don't need to make assumptions about which search terms indicate when and where the flu is spreading; how airlines price flights; or what the cooking preferences of Walmart customers are. Instead, we can perform correlation analysis on big data to know which search terms are most indicative of the spread of influenza, whether the price of air tickets will skyrocket, and which foods are most wanted by people staying home during hurricanes. of.
Data-driven correlation analysis of big data replaces error-prone methods based on assumptions. Big data correlation analysis methods are more accurate, faster, and less susceptible to bias. Prediction based on correlation analysis is the core of big data.
Correlation analysis itself is of great significance, and it also lays the foundation for studying causal relationships. By identifying things that may be related, we can build on this to conduct further causal analysis. If there is a causal relationship, then go one step further to find out why. This convenient mechanism reduces the cost of causal analysis through rigorous experiments. We can also find some important variables from the correlations, which can be used in experiments to verify causal relationships.
3. Typical application areas of big data prediction
The Internet has brought convenience to the popularization of big data prediction applications. Based on domestic and foreign cases, the following 11 This field is the most promising field of big data prediction application.
1) Weather forecast
Weather forecast is a typical big data prediction application field. The granularity of weather forecasts has been shortened from days to hours, and there are strict timeliness requirements. If calculations are performed through traditional methods based on massive data, tomorrow will have already arrived and predictions will be of no value when the conclusion is drawn. However, the development of big data technology provides high-speed computing capabilities, which greatly improves the effectiveness and accuracy of weather forecasts.
2) Sports event prediction
During the 2014 World Cup, companies such as Google, Baidu, Microsoft and Goldman Sachs all launched game result prediction platforms. Baidu's prediction results are the most eye-catching, with a prediction accuracy of 67% in the entire 64 games and an accuracy of 94% after entering the knockout rounds. This means that future sports events will be controlled by big data predictions.
Google World Cup prediction is based on Opta Sports’ massive event data to build the final prediction model. Baidu searched 37,000 game data of 987 teams (including national teams and club teams) around the world in the past five years, and
also cooperated with the Chinese lottery website Lecai.com and the European Betfair Index data provider Through data cooperation with SPdex, the company imported prediction data from the gambling market, established a prediction model that included 199,972 players and 112 million pieces of data, and made result predictions on this basis.
Judging from the successful experience of Internet companies, as long as there is historical data of sports events and cooperation with index companies, predictions of other events, such as the Champions League, NBA and other events, can be made.
3) Stock market prediction
Last year, research conducted by Warwick Business School in the UK and the Department of Physics at Boston University in the US found that financial keywords searched by users on Google may be able to predict the direction of the financial market. Correspondingly The investment strategy returns are as high as 326%. Previously, some experts tried to predict stock market fluctuations through the sentiment of Twitter blog posts.
4) Market Price Forecast
CPI is used to characterize the price fluctuations that have occurred, but the data from the Bureau of Statistics is not authoritative. Big data may help people understand the future price trend and predict inflation or economic crisis in advance. The most typical case is that Jack Ma learned about the Asian financial crisis in advance through Alibaba B2B big data.
It is easier to predict the price of a single product, especially for standardized products such as air tickets. The "air ticket calendar" provided by "Qunar" is a price prediction, which can tell you the approximate price of air tickets in a few months.
Since the production, channel cost and approximate gross profit of goods are relatively stable in a fully competitive market, the variables related to price are relatively fixed, and the supply and demand relationship of goods can be monitored in real time on the e-commerce platform, so the price can predict. Based on the prediction results, purchase time recommendations can be provided, or merchants can be guided to conduct dynamic price adjustments and marketing activities to maximize profits.
5) User behavior prediction
Based on data such as user search behavior, browsing behavior, comment history and personal information, Internet businesses can gain insight into the overall needs of consumers and then carry out targeted product production , improvement and marketing. "House of Cards" selects actors and plots, Baidu conducts precise advertising and marketing based on user preferences, Alibaba packages customized products for the production line based on Tmall user characteristics, and Amazon predicts user click behavior and ships products in advance, all of which benefit from predictions of Internet user behavior. As shown in Figure 1.
Figure 1 User Behavior Prediction
Benefiting from the development of sensor technology and the Internet of Things, offline user behavior insights are brewing. Free commercial Wi-Fi, iBeacon technology, camera image monitoring, indoor positioning technology, NFC sensor network, and queuing system can detect users' offline movement, stay, travel patterns and other data, so as to carry out precise marketing or product customization.
6) Human Health Prediction
Traditional Chinese medicine can discover some hidden chronic diseases in the human body by looking, smelling, asking and asking, and can even know what symptoms a person may have in the future by looking at their physical constitution. The body's physical signs change according to certain rules, and the human body will already have some persistent abnormalities before the occurrence of chronic diseases. Theoretically, if big data grasps such anomalies, it can predict chronic diseases.
Nature News & Views reported on a study by Zeevi et al. The complex question of how a person's blood glucose concentration is affected by specific foods. The study proposes a predictive model that can provide personalized food recommendations based on the microbes in the gut and other aspects of physiology, and can predict blood sugar responses more accurately than current standards. as shown in picture 2.
Figure 2 Blood glucose concentration prediction model
Intelligent hardware makes big data prediction of chronic diseases possible. Wearable devices and smart health devices can help the network collect human health data, such as heart rate, weight, blood lipids, blood sugar, amount of exercise, amount of sleep, etc. If these data are accurate and comprehensive enough, and have chronic disease prediction models that can form algorithms, perhaps in the future these wearable devices will remind users of their risk of suffering from certain chronic diseases.
7) Disease epidemic prediction
Disease epidemic prediction refers to predicting the possibility of a large-scale epidemic outbreak based on people's search conditions and shopping behaviors. The most classic "flu prediction" falls into this category . If there are more and more search requests for "influenza" and "isatis root" from a certain area, it is natural to speculate that there is an influenza trend there.
Baidu has launched a disease prediction product. It can currently analyze the activity and trend charts of every province in the country and most prefecture-level cities and districts and counties for four diseases: influenza, hepatitis, tuberculosis, and sexually transmitted diseases. situation and conduct comprehensive monitoring. In the future, the types of diseases monitored by Baidu Disease Prediction will expand from the current 4 to more than 30, covering more common diseases and epidemics. Users can take targeted prevention based on local prediction results.
8) Disaster prediction
Meteorological prediction is the most typical disaster prediction. If natural disasters such as earthquakes, floods, high temperatures, and heavy rains can be predicted and informed in advance using the power of big data, it will help reduce, prevent, and provide disaster relief. Different from the past, data collection methods in the past had problems such as dead ends and high costs. In the era of the Internet of Things, people can use cheap sensor cameras and wireless communication networks to conduct real-time data monitoring and collection, and then use big data predictions analysis to achieve more accurate predictions of natural disasters.
9) Environmental change prediction
In addition to short-term micro weather and disaster predictions, longer-term and macro environmental and ecological change predictions can also be made. The shrinking areas of forests and farmland, endangered wildlife and plants, rising coastlines, and the greenhouse effect are "chronic problems" facing the earth. The more data humans know about changes in Earth's ecosystems and weather patterns, the easier it will be to model future environmental changes and prevent bad changes from happening. Big data can help humans collect, store and mine more earth data, while also providing tools for prediction.
10) Traffic behavior prediction
Traffic behavior prediction refers to the prediction of traffic behavior based on the LBS positioning data of users and vehicles, analyzing the individual and group characteristics of people and vehicles traveling. The transportation department can perform intelligent vehicle scheduling or apply tidal lanes by predicting the traffic flow on different roads at different times. Users can choose roads with a lower probability of congestion based on the prediction results.
Baidu’s LBS prediction based on map applications covers a wider range. It can predict people's migration trends during the Spring Festival to guide the setting of train lines and routes. It can predict the flow of people in scenic spots during holidays to guide people's selection of scenic spots. It also has Baidu heat maps to tell users about city business districts, zoos and other places. The flow of people can guide users’ travel choices and businesses’ location selection.
11) Energy Consumption Forecast
The Likou State Grid System Operation Center manages more than 80% of California's power grid, delivering 289 million megawatts of power to 35 million users every year, with more than 40,000 power line lengths km. The center uses Space-Time Insight's software for intelligent management, comprehensively analyzes massive data from various data sources such as weather, sensors, and metering equipment, predicts changes in energy demand in various places, conducts intelligent power dispatch, and balances the power supply of the entire network. and needs, and respond quickly to potential crises. China's smart grid industry is already trying similar big data prediction applications.
In addition to the 11 fields listed above, big data prediction can also be applied in real estate prediction, employment situation prediction, college entrance examination score prediction, election result prediction, Oscar award prediction, insurance policyholder risk assessment, finance In areas such as borrower repayment ability assessment, humans have the ability to gain quantifiable, persuasive, and verifiable insight into the future, and the charm of big data prediction is being unleashed.
For more related knowledge, please visit the FAQ column!
The above is the detailed content of What is the core of big data. For more information, please follow other related articles on the PHP Chinese website!