Introduction:
When analyzing large log files, it's often necessary to retrieve the last N lines for pagination or inspection. This raises the question of how to efficiently tail a log file with an offset.
def tail(f, n, offset=0): avg_line_length = 74 to_read = n + offset while 1: try: f.seek(-(avg_line_length * to_read), 2) except IOError: f.seek(0) pos = f.tell() lines = f.read().splitlines() if len(lines) >= to_read or pos == 0: return lines[-to_read:offset and -offset or None] avg_line_length *= 1.3
Evaluation:
This approach makes assumptions about the average line length and incrementally seeks backwards until it finds enough lines. Due to the initial estimate, it may have to seek multiple times, potentially incurring performance penalties.
def tail(f, lines=20): BLOCK_SIZE = 1024 f.seek(0, 2) block_end_byte = f.tell() lines_to_go = lines block_number = -1 blocks = [] while lines_to_go > 0 and block_end_byte > 0: if (block_end_byte - BLOCK_SIZE > 0): f.seek(block_number * BLOCK_SIZE, 2) blocks.append(f.read(BLOCK_SIZE)) else: f.seek(0, 0) blocks.append(f.read(block_end_byte)) lines_found = blocks[-1].count('\n') lines_to_go -= lines_found block_end_byte -= BLOCK_SIZE block_number -= 1 all_read_text = ''.join(reversed(blocks)) return '\n'.join(all_read_text.splitlines()[-lines:])
Explanation:
This method backtracks through the file block by block until it finds the desired number of newlines. It doesn't make assumptions about line length and reads from the beginning if the file is too small to backtrack.
Candidate Solution 2 is generally more efficient and robust than Candidate Solution 1, as it doesn't rely on estimates and reads the file sequentially. It's a more reliable approach for tailing log files with offsets.
The above is the detailed content of How to Efficiently Retrieve the Last N Lines of a Large File?. For more information, please follow other related articles on the PHP Chinese website!