Example of analyzing nginx logs with python+pandas
The following is an example of analyzing nginx logs using python pandas. It has good reference value and I hope it will be helpful to everyone. Let’s take a look together
Requirements
By analyzing the nginx access log, obtain the maximum, minimum, and average response time of each interface and visits.
Implementation principle
Store the nginx log uriuriupstream_response_time field in the pandas dataframe, and then implement it through grouping and data statistics functions.
Implementation
1. Preparation
#创建日志目录,用于存放日志 mkdir /home/test/python/log/log #创建文件,用于存放从nginx日志中提取的$uri $upstream_response_time字段 touch /home/test/python/log/log.txt #安装相关模块 conda create -n science numpy scipy matplotlib pandas #安装生成execl表格的相关模块 pip install xlwt
2. Code implementation
#!/usr/local/miniconda2/envs/science/bin/python #-*- coding: utf-8 -*- #统计每个接口的响应时间 #请提前创建log.txt并设置logdir import sys import os import pandas as pd mulu=os.path.dirname(__file__) #日志文件存放路径 logdir="/home/test/python/log/log" #存放统计所需的日志相关字段 logfile_format=os.path.join(mulu,"log.txt") print "read from logfile \n" for eachfile in os.listdir(logdir): logfile=os.path.join(logdir,eachfile) with open(logfile, 'r') as fo: for line in fo: spline=line.split() #过滤字段中异常部分 if spline[6]=="-": pass elif spline[6]=="GET": pass elif spline[-1]=="-": pass else: with open(logfile_format, 'a') as fw: fw.write(spline[6]) fw.write('\t') fw.write(spline[-1]) fw.write('\n') print "output panda" #将统计的字段读入到dataframe中 reader=pd.read_table(logfile_format,sep='\t',engine='python',names=["interface","reponse_time"] ,header=None,iterator=True) loop=True chunksize=10000000 chunks=[] while loop: try: chunk=reader.get_chunk(chunksize) chunks.append(chunk) except StopIteration: loop=False print "Iteration is stopped." df=pd.concat(chunks) #df=df.set_index("interface") #df=df.drop(["GET","-"]) df_groupd=df.groupby('interface') df_groupd_max=df_groupd.max() df_groupd_min= df_groupd.min() df_groupd_mean= df_groupd.mean() df_groupd_size= df_groupd.size() #print df_groupd_max #print df_groupd_min #print df_groupd_mean df_ana=pd.concat([df_groupd_max,df_groupd_min,df_groupd_mean,df_groupd_size],axis=1,keys=["max","min","average","count"]) print "output excel" df_ana.to_excel("test.xls")
##3. The printed form is as follows:
Key Points
1. When the log file is relatively large Do not use readlines() or readline() when reading, as this will read all the logs into the memory, causing the memory to become full. Therefore, the for line in fo iteration method is used here, which basically does not occupy memory. 2. To read nginx logs, you can use pd.read_table(log_file, sep=' ', iterator=True), but the sep we set here cannot match the split normally, so first split nginx with split , and then save it to pandas. 3. Pandas provides IO tools to read large files in chunks, use different chunk sizes to read and then call pandas.concat to connect to the DataFrameRelated recommendations:python3 pandas reads MySQL data and inserts
##
The above is the detailed content of Example of analyzing nginx logs with python+pandas. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



How to configure an Nginx domain name on a cloud server: Create an A record pointing to the public IP address of the cloud server. Add virtual host blocks in the Nginx configuration file, specifying the listening port, domain name, and website root directory. Restart Nginx to apply the changes. Access the domain name test configuration. Other notes: Install the SSL certificate to enable HTTPS, ensure that the firewall allows port 80 traffic, and wait for DNS resolution to take effect.

How to confirm whether Nginx is started: 1. Use the command line: systemctl status nginx (Linux/Unix), netstat -ano | findstr 80 (Windows); 2. Check whether port 80 is open; 3. Check the Nginx startup message in the system log; 4. Use third-party tools, such as Nagios, Zabbix, and Icinga.

The methods that can query the Nginx version are: use the nginx -v command; view the version directive in the nginx.conf file; open the Nginx error page and view the page title.

Steps to create a Docker image: Write a Dockerfile that contains the build instructions. Build the image in the terminal, using the docker build command. Tag the image and assign names and tags using the docker tag command.

You can query the Docker container name by following the steps: List all containers (docker ps). Filter the container list (using the grep command). Gets the container name (located in the "NAMES" column).

Starting an Nginx server requires different steps according to different operating systems: Linux/Unix system: Install the Nginx package (for example, using apt-get or yum). Use systemctl to start an Nginx service (for example, sudo systemctl start nginx). Windows system: Download and install Windows binary files. Start Nginx using the nginx.exe executable (for example, nginx.exe -c conf\nginx.conf). No matter which operating system you use, you can access the server IP

To get Nginx to run Apache, you need to: 1. Install Nginx and Apache; 2. Configure the Nginx agent; 3. Start Nginx and Apache; 4. Test the configuration to ensure that you can see Apache content after accessing the domain name. In addition, you need to pay attention to other matters such as port number matching, virtual host configuration, and SSL/TLS settings.

In Linux, use the following command to check whether Nginx is started: systemctl status nginx judges based on the command output: If "Active: active (running)" is displayed, Nginx is started. If "Active: inactive (dead)" is displayed, Nginx is stopped.
