Hive导入Apache Nginx等日志与分析-Mysql Tutorial-php.cn

Home

Database

Mysql Tutorial

Hive导入Apache Nginx等日志与分析

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 07, 2016 pm 05:04 PM

apache hive nginx

将nginx日志导入到hive中的两种方法 1 在hive中建表 CREATE TABLE apachelog (ipaddress STRING, identd STRING, user STRING

将nginx日志导入到hive中的两种方法

1 在hive中建表

导入后日志格式为

203.208.60.91 - - 05/May/2011:01:18:47 +0800 GET /robots.txt HTTP/1.1 404 1238 Mozilla/5.0

此方法支持hive中函数parse_url(referer,"HOST")

第二种方法导入

注意：这个方法在建表后，使用查询语句等前要先执行

hive> add jar /home/hjl/hive/lib/hive_contrib.jar;

或者设置hive/conf/hive-default.conf 添加

hive.aux.jars.path
file:///usr/local/hadoop/hive/lib/hive-contrib-0.7.0-cdh3u0.jar

保存配置

203.208.60.91 - - [05/May/2011:01:18:47 +0800] "GET /robots.txt HTTP/1.1" 404 1238 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +)"

此方法中的字段类型stringfrom deserializer 经测试不支持parse_url(referer,"HOST")获取域名

可以用select split(referer,"/")[2] from apilog 获取域名

如果文件数据是纯文本，可以使用 STORED AS TEXTFILE。如果数据需要压缩，，使用 STORED AS SEQUENCE 。

导入日志命令

hive>load data local inpath '/home/log/map.gz' overwrite into table log;

导入日志支持.gz等格式

导入日志后进行分析例句

统计行数
select count(*) from nginxlog;

统计IP数
select count(DISTINCT ip) from nginxlog;

排行
select t2.ip,t2.xx from (SELECT ip, COUNT(*) AS xx FROM nginxlog GROUP by ip) t2 sort by t2.xx desc

hive>SELECT * from apachelog WHERE ipaddress = '216.211.123.184';

hive> SELECT ipaddress, COUNT(1) AS numrequest FROM apachelog GROUP BY ipaddress SORT BY numrequest DESC LIMIT 1;

hive> set mapred.reduce.tasks=2;
hive> SELECT ipaddress, COUNT(1) AS numrequest FROM apachelog GROUP BY ipaddress SORT BY numrequest DESC LIMIT 1;

hive>CREATE TABLE ipsummary (ipaddress STRING, numrequest INT);
hive>INSERT OVERWRITE TABLE ipsummary SELECT ipaddress, COUNT(1) FROM apachelog GROUP BY ipaddress;

hive>SELECT ipsummary.ipaddress, ipsummary.numrequest FROM (SELECT MAX(numrequest) AS themax FROM ipsummary) ipsummarymax JOIN ipsummary ON ipsummarymax.themax = ipsummary.numrequest;

hive查询结果导出为csv的方法（未测试）

hive> set hive.io.output.fileformat=CSVTextFile;
hive> insert overwrite local directory '/tmp/CSVrepos/' select * from S where ... ;

linux

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Chat Commands and How to Use Them

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7542

CakePHP Tutorial

1381

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

How to check whether nginx is started Apr 14, 2025 pm 01:03 PM

How to confirm whether Nginx is started: 1. Use the command line: systemctl status nginx (Linux/Unix), netstat -ano | findstr 80 (Windows); 2. Check whether port 80 is open; 3. Check the Nginx startup message in the system log; 4. Use third-party tools, such as Nagios, Zabbix, and Icinga.

How to check whether nginx is started? Apr 14, 2025 pm 12:48 PM

In Linux, use the following command to check whether Nginx is started: systemctl status nginx judges based on the command output: If "Active: active (running)" is displayed, Nginx is started. If "Active: inactive (dead)" is displayed, Nginx is stopped.

How to start nginx in Linux Apr 14, 2025 pm 12:51 PM

Steps to start Nginx in Linux: Check whether Nginx is installed. Use systemctl start nginx to start the Nginx service. Use systemctl enable nginx to enable automatic startup of Nginx at system startup. Use systemctl status nginx to verify that the startup is successful. Visit http://localhost in a web browser to view the default welcome page.

How to configure nginx in Windows Apr 14, 2025 pm 12:57 PM

How to configure Nginx in Windows? Install Nginx and create a virtual host configuration. Modify the main configuration file and include the virtual host configuration. Start or reload Nginx. Test the configuration and view the website. Selectively enable SSL and configure SSL certificates. Selectively set the firewall to allow port 80 and 443 traffic.

How to start nginx server Apr 14, 2025 pm 12:27 PM

Starting an Nginx server requires different steps according to different operating systems: Linux/Unix system: Install the Nginx package (for example, using apt-get or yum). Use systemctl to start an Nginx service (for example, sudo systemctl start nginx). Windows system: Download and install Windows binary files. Start Nginx using the nginx.exe executable (for example, nginx.exe -c conf\nginx.conf). No matter which operating system you use, you can access the server IP

How to set nginx access address to server ip Apr 14, 2025 am 11:36 AM

To set the access address to server IP in Nginx, configure the server block, set the listening address (such as listen 192.168.1.10:80) Set the server name (such as server_name example.com www.example.com), or leave it blank to access the server IP and reload Nginx to apply the changes

How to check the running status of nginx Apr 14, 2025 am 11:48 AM

The methods to view the running status of Nginx are: use the ps command to view the process status; view the Nginx configuration file /etc/nginx/nginx.conf; use the Nginx status module to enable the status endpoint; use monitoring tools such as Prometheus, Zabbix, or Nagios.

How to solve nginx403 Apr 14, 2025 am 10:33 AM

How to fix Nginx 403 Forbidden error? Check file or directory permissions; 2. Check .htaccess file; 3. Check Nginx configuration file; 4. Restart Nginx. Other possible causes include firewall rules, SELinux settings, or application issues.

See all articles