Home > Java > javaTutorial > body text

Introduction to offline data analysis process

巴扎黑
Release: 2017-06-26 11:33:45
Original
1769 people have browsed it

3. OfflineData analysisProcess introduction

Note: This link mainly experiences the macro concept and processing process of the data analysis system, and initially understands the application links of hadoop and other frameworks. Don’t pay too much attention. Code details

A widely used data analysis system: "webLog data mining"

3.1 Requirements Analysis

3.1.1 Case Name

"Website or APPClickstream Log Data Mining System".

3.1.2 Case requirement description

Web "Clickstream log" contains very important information for website operation. Through log analysis, we can know the number of visits to the website, which webpage has the most visitors, which webpage is the most valuable, advertising conversion rate, visitor source information, and visitor terminal information. wait.

3.1.3 Data source

The data of this case is mainly composed of User’s click behavior record

How to obtain: Pre-embed a js program on the page for the page you want to monitor Label binding event, as long as the user clicks or moves to the label, it can trigger the ajax request to the backgroundservlet program, use log4jRecord event information to the web server (nginx, tomcat, etc.), a growing log file is formed.

Form:

3.2 Data processing process

##3.2.1 Flow chart analysis

This case is very similar to the typical BI system, the overall process As follows:

##However, since the premise of this case

is

handles massive amounts of data. Therefore, the technologies used in each link of the process are completely different from traditional BI. Subsequent courses will explain them one by one: 1) Data collection: Customized development of the collection program, or using the open source frameworkFLUME

2) Data preprocessing: Customized developmentmapreduce

The program runs on

hadoopCluster3) Data warehouse technology:

Hive

# based on hadoop ##4) Data export: sqoop

data import and export tool based on hadoop5) Data visualization: Customized development of web programs or the use of

kettle and other products6) The entire process Process scheduling: hadoop#oozie

tools in the hadoop ecosystem or other similar open source products 3.2.2

Project technical architecture diagram

3.2.3

Project related screenshots (Perceptual understanding, just appreciation)

a) MapreudceProgram running

b)

Query data in

Hive

##c)

Import statistical results into

mysql

##./sqoop export --connect jdbc:mysql://localhost:3306/weblogdb --username root --password root --table t_display_xx --export-dir /user/hive/warehouse/uv/dt=2014-08-03

##58.215.204.118 - - [18/Sep/2013:06: 51:35 +0000] "GET /wp-includes/js/jquery/jquery.js?ver=1.10.2 HTTP/1.1" 304 0 "http://blog.fens.me/nodejs-socketio-chat/" "Mozilla/5.0 (Windows NT 5.1; rv:23.0) Gecko/20100101 Firefox/23.0"

3.3

Final effect of the project

After the complete data processing process, reports of various statistical indicators will be periodically output. In production practice, these reports will eventually need to be The data is displayed in the form of visualization. This case uses the web program to realize data visualization.

The effect is as follows:

The above is the detailed content of Introduction to offline data analysis process. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!