Assume that the path of the log we want to read in real time is: /data/mongodb/shard1/log/pg.csv
Then we can use the shell script command tail -F in the python file to read and operate in real time
The code is as follows:
import re import codecs import subprocess def pg_data_to_elk(): p = subprocess.Popen('tail -F /data/mongodb/shard1/log/pg.csv', shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE,) #起一个进程,执行shell命令 while True: line = p.stdout.readline() #实时获取行 if line: #如果行存在的话 xxxxxxxxxxxx your operation
Briefly explain the subprocess module:
subprocess allows You spawn new processes, connect to their input/output/error pipes, and get their return (status) codes.
Introduction to subprocess.Popen
This class is used to execute a subprogram in a new process.
Constructor of subprocess.Popen
class subprocess.Popen(args, bufsize=-1, executable=None, stdin=None, stdout=None, stderr=None, preexec_fn=None, close_fds=True, shell=False, cwd=None, env=None, universal_newlines=False, startup_info=None, creationflags=0, restore_signals=True, start_new_session=False, pass_fds=())
Parameter description:
args: The shell command to be executed can be a string or A sequence of command parameters. When the value of this parameter is a string, the interpretation process of the command is platform-dependent, so it is generally recommended to pass the args parameter as a sequence.
stdin, stdout, stderr: represent the program's standard input, output, and error handles respectively.
shell: This parameter is used to identify whether to use the shell as the program to be executed. If the shell value is True, it is recommended to pass the args parameter as a string rather than as a sequence. .
If the log will generate a new log file when certain conditions are met, such as log1.csv has When it reaches 20M, log2.csv will be written. In this way, there will be more than 1,000 files in a day, and new ones will continue to be generated. So how to obtain them in real time?
The idea is as follows:
Add the current file size judgment to the real-time monitoring (tail -F). If the current file size is greater than 20M, then jump out of the real-time monitoring and obtain a new log file. (This is the same idea if there are other judgment conditions, but replace the judgment of the current file size with the judgment you need)
The code is as follows:
import re import os import time import codecs import subprocess from datetime import datetime path = '/home/liao/python/csv' time_now_day = datetime.now.strftime('%Y-%m-%d') def get_file_size(new_file): fsize = os.path.getsize(new_file) fsize = fsize/float(1024*1024) return fsize def get_the_new_file(): files = os.listdir(path) files_list = list(filter(lambda x:x[-4:]=='.csv' and x[11:21]==time_now_day, files)) files_list.sort(key=lambda fn:os.path.getmtime(path + '/' + fn) if not os.path.isdir(path + '/' + fn) else 0) new_file = os.path.join(path, files_list[-1]) return new_file def pg_data_to_elk(): while True: new_file = get_the_new_file() p = subprocess.Popen('tail -F {0}'.format(new_file), shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE,) #起一个进程,执行shell命令 while True: line = p.stdout.readline() #实时获取行 if line: #如果行存在的话 if get_file_size(new_file) > 20: #如果大于20M,则跳出循环 break xxxxxxxxxxxx your operation time.sleep(3)
The above is the detailed content of How to use Python3 real-time operation to process log files. For more information, please follow other related articles on the PHP Chinese website!