How to use Python3 real-time operation to process log files-Python Tutorial-php.cn

How to use Python3 real-time operation to process log files

PHPz

Release： 2023-04-20 15:01:06

forward

1604 people have browsed it

1. Simple real-time file processing (single file)

Assume that the path of the log we want to read in real time is: /data/mongodb/shard1/log/pg.csv

Then we can use the shell script command tail -F in the python file to read and operate in real time

The code is as follows:

import re
import codecs
import subprocess
 
def pg_data_to_elk():
    p = subprocess.Popen(&#39;tail -F /data/mongodb/shard1/log/pg.csv&#39;, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE,)    #起一个进程，执行shell命令
    while True:
        line = p.stdout.readline()   #实时获取行
        if line:                     #如果行存在的话
            xxxxxxxxxxxx
            your operation

Copy after login

Briefly explain the subprocess module:

subprocess allows You spawn new processes, connect to their input/output/error pipes, and get their return (status) codes.

Introduction to subprocess.Popen

This class is used to execute a subprogram in a new process.

Constructor of subprocess.Popen

class subprocess.Popen(args, bufsize=-1, executable=None, stdin=None, stdout=None, stderr=None, 
    preexec_fn=None, close_fds=True, shell=False, cwd=None, env=None, universal_newlines=False,
    startup_info=None, creationflags=0, restore_signals=True, start_new_session=False, pass_fds=())

Copy after login

Parameter description:

args: The shell command to be executed can be a string or A sequence of command parameters. When the value of this parameter is a string, the interpretation process of the command is platform-dependent, so it is generally recommended to pass the args parameter as a sequence.
stdin, stdout, stderr: represent the program's standard input, output, and error handles respectively.
shell: This parameter is used to identify whether to use the shell as the program to be executed. If the shell value is True, it is recommended to pass the args parameter as a string rather than as a sequence. .

2. Complex real-time file processing (continuously generating new files)

If the log will generate a new log file when certain conditions are met, such as log1.csv has When it reaches 20M, log2.csv will be written. In this way, there will be more than 1,000 files in a day, and new ones will continue to be generated. So how to obtain them in real time?

The idea is as follows:

Add the current file size judgment to the real-time monitoring (tail -F). If the current file size is greater than 20M, then jump out of the real-time monitoring and obtain a new log file. (This is the same idea if there are other judgment conditions, but replace the judgment of the current file size with the judgment you need)

The code is as follows:

import re
import os
import time
import codecs
import subprocess
from datetime import datetime
 
path = &#39;/home/liao/python/csv&#39;
time_now_day = datetime.now.strftime(&#39;%Y-%m-%d&#39;)
 
def get_file_size(new_file):
    fsize = os.path.getsize(new_file)
    fsize = fsize/float(1024*1024)
    return fsize
 
def get_the_new_file():
    files = os.listdir(path)
    files_list = list(filter(lambda x:x[-4:]==&#39;.csv&#39; and x[11:21]==time_now_day, files))
    files_list.sort(key=lambda fn:os.path.getmtime(path + &#39;/&#39; + fn) if not os.path.isdir(path + &#39;/&#39; + fn) else 0)
    new_file = os.path.join(path, files_list[-1])
    return new_file
 
def pg_data_to_elk():
    while True:
        new_file = get_the_new_file()
        p = subprocess.Popen(&#39;tail -F {0}&#39;.format(new_file), shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE,)    #起一个进程，执行shell命令
        while True:
            line = p.stdout.readline()   #实时获取行
            if line:                     #如果行存在的话
                if get_file_size(new_file) > 20:    #如果大于20M，则跳出循环
                    break
                xxxxxxxxxxxx
                your operation
        time.sleep(3)

Copy after login

The above is the detailed content of How to use Python3 real-time operation to process log files. For more information, please follow other related articles on the PHP Chinese website!