How to Efficiently Retrieve the Last N Lines of a Large File?-Python Tutorial-php.cn

How to Efficiently Retrieve the Last N Lines of a Large File?

Patricia Arquette

Release： 2024-11-30 10:39:10

Original

484 people have browsed it

How to Efficiently Retrieve the Last N Lines of a Large File?

Get Last N Lines of a File, Simulating 'Tail'

Introduction:

When analyzing large log files, it's often necessary to retrieve the last N lines for pagination or inspection. This raises the question of how to efficiently tail a log file with an offset.

Candidate Solution 1:

def tail(f, n, offset=0):
    avg_line_length = 74
    to_read = n + offset
    while 1:
        try:
            f.seek(-(avg_line_length * to_read), 2)
        except IOError:
            f.seek(0)
        pos = f.tell()
        lines = f.read().splitlines()
        if len(lines) >= to_read or pos == 0:
            return lines[-to_read:offset and -offset or None]
        avg_line_length *= 1.3

Copy after login

Evaluation:

This approach makes assumptions about the average line length and incrementally seeks backwards until it finds enough lines. Due to the initial estimate, it may have to seek multiple times, potentially incurring performance penalties.

Candidate Solution 2:

def tail(f, lines=20):
    BLOCK_SIZE = 1024
    f.seek(0, 2)
    block_end_byte = f.tell()
    lines_to_go = lines
    block_number = -1
    blocks = []
    while lines_to_go > 0 and block_end_byte > 0:
        if (block_end_byte - BLOCK_SIZE > 0):
            f.seek(block_number * BLOCK_SIZE, 2)
            blocks.append(f.read(BLOCK_SIZE))
        else:
            f.seek(0, 0)
            blocks.append(f.read(block_end_byte))
        lines_found = blocks[-1].count('\n')
        lines_to_go -= lines_found
        block_end_byte -= BLOCK_SIZE
        block_number -= 1
    all_read_text = ''.join(reversed(blocks))
    return '\n'.join(all_read_text.splitlines()[-lines:])

Copy after login

Explanation:

This method backtracks through the file block by block until it finds the desired number of newlines. It doesn't make assumptions about line length and reads from the beginning if the file is too small to backtrack.

Comparison:

Candidate Solution 2 is generally more efficient and robust than Candidate Solution 1, as it doesn't rely on estimates and reads the file sequentially. It's a more reliable approach for tailing log files with offsets.

The above is the detailed content of How to Efficiently Retrieve the Last N Lines of a Large File?. For more information, please follow other related articles on the PHP Chinese website!