Home Backend Development Python Tutorial Two methods for Nginx service log analysis (shell+python)

Two methods for Nginx service log analysis (shell+python)

Mar 24, 2017 pm 03:19 PM

python脚本

log_format main ‘$remote_addr – $remote_user [$time_iso8601] “$request” ‘

‘$status $body_bytes_sent “$http_referer” ‘

‘”$http_user_agent” “$http_x_forwarded_for” ‘

‘ “$upstream_addr” “$upstream_status” “$request_time"`;

cat website.access.log| awk ‘{print $(NF)}’ | awk -F “\”” ‘{print $2′}>a.txt

paste -d ” ” website.access.log  a.txt > b.txt

cat b.txt |awk ‘($NF>1){print $6$7 ” ” $NF}’>c.txt

linux下使用awk,wc,sort,uniq,grep对nginx日志进行分析和统计

b). 字段含义(如下说明)

column1:ip_address

column2:log_time

column3:request

column4:status_code

column5:send_bytes

column6:referer

需求一:统计总记录数,总成功数,各种失败数:404,403,500

cat data.log|awk -F '\t' '{if($4 > 0) print $4}'|wc -l|

awk '{print "Total Items:"$1}'

2. 提取成功、各种失败总数

cat data.log|awk -F '\t' '{if($4>0 && $4==200) print $4}'|wc -l

需求二:各种错误中,哪类URL出现的次数最多,要求剔除重复项,并倒叙给出结果

cat data.log|awk -F '\t' '{if($4>0 && $4==500) print $3}'|awk '{print $2}'|sort|uniq -c|sort -k1 nr

需求三:要统计URL中文件名出现的次数,结果中要包含Code 和 Referer。但是 URL和 Referer中都包含 / 字符,对于过滤有干扰,尝试去解决。

cat data.log|awk '{print $5,$7,$9}'|grep 200|

sed 's#.*/\(.*\)#\1#'|sort -k1|uniq -c

wc -l access.log |awk '{print $1}'  总请求数

awk '{print $1}' access.log|sort |uniq |wc -l 独立IP数

awk  -F'[ []' '{print $5}' access.log|sort|uniq -c|sort -rn|head -5  每秒客户端请求数 TOP5

awk '{print $1}' access.log|sort |uniq -c | sort -rn |head -5 访问最频繁IP Top5

awk '{print $7}' access.log|sort |uniq -c | sort -rn |head -5 访问最频繁的URL TOP5

awk '{if ($12 > 10){print $7}}' access.log|sort|uniq -c|sort -rn |head -5 

响应大于10秒的URL TOP5

awk '{if ($13 != 200){print $13}}' access.log|sort|uniq -c|sort -rn|head -5 

分析请求数大于50000的源IP的行为

awk '{print $1}' access.log|sort |uniq -c |sort -rn|awk '{if ($1 > 50000){print $2}}' > tmp.txt

for i in $(cat tmp.txt)

do

   echo $i  >> analysis.txt

   echo "访问行为统计" >> analysis.txt

   grep $i  access.log|awk '{print $6}' |sort |uniq -c | sort -rn |head -5 >> analysis.txt

   echo "访问接口统计" >> analysis.txt

   grep $i  access.log|awk '{print $7}' |sort |uniq -c | sort -rn |head -5 >> analysis.txt

   echo -e "\n"  >> /root/analysis/$Ydate.txt

done

如果源IP来自代理服务器,应将第一条命令过滤地址改为$http_x_forwarded_for地址

awk '{print $NF}' access.log|sort |uniq -c |sort -rn|awk '{if ($1 > 50000){print $2}}' > tmp.txt

5.性能指标

并发连接数

客户端向服务器发起请求,并建立了TCP连接。每秒钟服务器链接的总TCP数量,就是并发连接数

PV(page view)  UV(unique visitor) 独立IP

6.故障

1.Nginx Connection 不夠用 的參數調整

2.nginx+php-fpm出现502

3.线上nginx的一次“no live upstreams while connecting to upstream ”分析

4.nginx proxy_pass末端神奇的斜线

5.nginx+tomcat使用apache的FtpClient上传图片时由于多线程问题导致的文件大小为0的问题

案例一
ip - - [23/Mar/2017:00:17:49 +0800] "GET / HTTP/1.1" 302 0 "-" "PycURL/7.19.7"
 
log_format access '$HTTP_X_REAL_IP - $remote_user [$time_local] "$request"'
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" $HTTP_X_Forwarded_For';
 
192.168.21.1 - - [27/Jan/2014:11:28:53 +0800] "GET /2.php HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1707.0 Safari/537.36" "-"192.168.21.128 200 127.0.0.1:9000 0.119 0.119
 
#log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '    
#                  '$status $body_bytes_sent "$http_referer" '
#                  '"$http_user_agent" "$http_x_forwarded_for"';
 
$http_host:用户在浏览器中输入的URL(IP或着域名)地址  192.168.21.128
$upstream_status: upstream状态    200
$upstream_addr: 后端upstream地址及端口  127.0.0.1:9000
$request_time: 页面访问总时间  0.119
$upstream_response_time:页面访问中upstream响应时间   0.119
 
$10 $body_bytes_sent
$1  $remote_addr
$7  $request
$11 $http_referer
$9  $status
$6  http_user_agent
 
1、总访问量
2、总带宽
3、独立访客量
4、访问IP统计
5、访问url统计
6、来源统计
7、404统计
8、搜索引擎访问统计(谷歌,百度)
9、搜索引擎来源统计(谷歌,百度)
 
#!/bin/bash
log_path=/home/www.centos.bz/log/access.log.1
domain="centos.bz"
email="log@centos.bz"
maketime=`date +%Y-%m-%d" "%H":"%M`
logdate=`date -d "yesterday" +%Y-%m-%d`
total_visit=`wc -l ${log_path} | awk '{print $1}'`
total_bandwidth=`awk -v total=0 '{total+=$10}END{print total/1024/1024}' ${log_path}`
total_unique=`awk '{ip[$1]++}END{print asort(ip)}' ${log_path}`
ip_pv=`awk '{ip[$1]++}END{for (k in ip){print ip[k],k}}' ${log_path} | sort -rn | head -20`
url_num=`awk '{url[$7]++}END{for (k in url){print url[k],k}}' ${log_path} | sort -rn | head -20`
referer=`awk -v domain=$domain '$11 !~ 
/http:\/\/[^/]*'"$domain"'/{url[$11]++}END{for (k in url){print 
url[k],k}}' ${log_path} | sort -rn | head -20`
notfound=`awk '$9 == 404 {url[$7]++}END{for (k in url){print url[k],k}}' ${log_path} | sort -rn | head -20`
spider=`awk -F'"' '$6 ~ /Baiduspider/ {spider["baiduspider"]++} $6 ~
 /Googlebot/ {spider["googlebot"]++}END{for (k in spider){print 
k,spider[k]}}'  ${log_path}`
search=`awk -F'"' '$4 ~ /http:\/\/www\.baidu\.com/ 
{search["baidu_search"]++} $4 ~ /http:\/\/www\.google\.com/ 
{search["google_search"]++}END{for (k in search){print k,search[k]}}' 
${log_path}`
#echo -e "概况\n报告生成时间:${maketime}\n总访问量:${total_visit}\n总带宽:${total_bandwidth}M\n独
立访客:${total_unique}\n\n访问IP统计\n${ip_pv}\n\n访问url统计\n${url_num}\n\n来源页面统计
\n${referer}\n\n404统计\n${notfound}\n\n蜘蛛统计\n${spider}\n\n搜索引擎来源统计
\n${search}" | mail -s "$domain $logdate log statistics" ${email}
Copy after login
案例二
# tar zxvf pymongo-1.11.tar.gz
# cd pymongo-1.11
# python setup.py install
python连接mongodb样例
$ cat conn_mongodb.py 
#!/usr/bin/python
   
import pymongo
import random
   
conn = pymongo.Connection("127.0.0.1",27017)
db = conn.tage #连接库
db.authenticate("tage","123")
#用户认证
db.user.drop()
#删除集合user
db.user.save({'id':1,'name':'kaka','sex':'male'})
 #插入一个数据
for id in range(2,10):
    name = random.choice(['steve','koby','owen','tody','rony'])
    sex = random.choice(['male','female'])
    db.user.insert({'id':id,'name':name,'sex':sex}) 
#通过循环插入一组数据
content = db.user.find()
#打印所有数据
for i in content:
    print i
 
编写python脚本
#encoding=utf8
   
import re
   
zuidaima_nginx_log_path="/usr/local/nginx/logs/www.zuidaima.com.access.log"
pattern = re.compile(r'^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}')
   
def stat_ip_views(log_path):
    ret={}
    f = open(log_path, "r")
    for line in f:
        match = pattern.match(line)
        if match:
            ip=match.group(0)
            if ip in ret:
                views=ret[ip]
            else:
                views=0
            views=views+1
            ret[ip]=views
    return ret
def run():
    ip_views=stat_ip_views(zuidaima_nginx_log_path)
    max_ip_view={}
    for ip in ip_views:
        views=ip_views[ip]
        if len(max_ip_view)==0:
            max_ip_view[ip]=views
        else:
            _ip=max_ip_view.keys()[0]
            _views=max_ip_view[_ip]
            if views>_views:
                max_ip_view[ip]=views
                max_ip_view.pop(_ip)
   
        print "ip:", ip, ",views:", views
    #总共有多少ip
    print "total:", len(ip_views)
    #最大访问的ip
    print "max_ip_view:", max_ip_view
   
run()
Copy after login

The above is the detailed content of Two methods for Nginx service log analysis (shell+python). For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Chat Commands and How to Use Them
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to solve the permissions problem encountered when viewing Python version in Linux terminal? How to solve the permissions problem encountered when viewing Python version in Linux terminal? Apr 01, 2025 pm 05:09 PM

Solution to permission issues when viewing Python version in Linux terminal When you try to view Python version in Linux terminal, enter python...

How to teach computer novice programming basics in project and problem-driven methods within 10 hours? How to teach computer novice programming basics in project and problem-driven methods within 10 hours? Apr 02, 2025 am 07:18 AM

How to teach computer novice programming basics within 10 hours? If you only have 10 hours to teach computer novice some programming knowledge, what would you choose to teach...

How to efficiently copy the entire column of one DataFrame into another DataFrame with different structures in Python? How to efficiently copy the entire column of one DataFrame into another DataFrame with different structures in Python? Apr 01, 2025 pm 11:15 PM

When using Python's pandas library, how to copy whole columns between two DataFrames with different structures is a common problem. Suppose we have two Dats...

How to avoid being detected by the browser when using Fiddler Everywhere for man-in-the-middle reading? How to avoid being detected by the browser when using Fiddler Everywhere for man-in-the-middle reading? Apr 02, 2025 am 07:15 AM

How to avoid being detected when using FiddlerEverywhere for man-in-the-middle readings When you use FiddlerEverywhere...

What are regular expressions? What are regular expressions? Mar 20, 2025 pm 06:25 PM

Regular expressions are powerful tools for pattern matching and text manipulation in programming, enhancing efficiency in text processing across various applications.

How does Uvicorn continuously listen for HTTP requests without serving_forever()? How does Uvicorn continuously listen for HTTP requests without serving_forever()? Apr 01, 2025 pm 10:51 PM

How does Uvicorn continuously listen for HTTP requests? Uvicorn is a lightweight web server based on ASGI. One of its core functions is to listen for HTTP requests and proceed...

How to dynamically create an object through a string and call its methods in Python? How to dynamically create an object through a string and call its methods in Python? Apr 01, 2025 pm 11:18 PM

In Python, how to dynamically create an object through a string and call its methods? This is a common programming requirement, especially if it needs to be configured or run...

What are some popular Python libraries and their uses? What are some popular Python libraries and their uses? Mar 21, 2025 pm 06:46 PM

The article discusses popular Python libraries like NumPy, Pandas, Matplotlib, Scikit-learn, TensorFlow, Django, Flask, and Requests, detailing their uses in scientific computing, data analysis, visualization, machine learning, web development, and H

See all articles