如何有效率地檢索海量文字檔的最後10行？-C++-PHP中文網

如何有效率地檢索海量文字檔的最後10行？

Mary-Kate Olsen

發布： 2025-01-05 10:23:41

原創

530 人瀏覽過

How Can I Efficiently Retrieve the Last 10 Lines of a Gigantic Text File?

高效檢索海量文字檔案的最後 10 行

在大規模文字處理領域，檢索超大檔案的最後幾行會帶來獨特的挑戰。其中一個檔案的大小超過 10GB，這對於有效獲取此資料構成了重大障礙。

要解決此問題，一種有效的方法是從末尾開始向後遍歷文件。我們的目標是找到十個連續的換行符，表明所需行的存在。隨後，考慮到潛在的編碼變化，我們向前閱讀以捕獲這些行。

例如，在 C# 中，全面的實作可以處理檔案包含少於十行的情況。以下程式碼片段舉例說明了這種方法：

public static string ReadEndLines(string path, Int64 numberOfLines, Encoding encoding, string lineSeparator) {

    int sizeOfChar = encoding.GetByteCount("\n");
    byte[] buffer = encoding.GetBytes(lineSeparator);


    using (FileStream fs = new FileStream(path, FileMode.Open)) {
        Int64 lineCount = 0;
        Int64 endPosition = fs.Length / sizeOfChar;

        for (Int64 position = sizeOfChar; position < endPosition; position += sizeOfChar) {
            fs.Seek(-position, SeekOrigin.End);
            fs.Read(buffer, 0, buffer.Length);

            if (encoding.GetString(buffer) == lineSeparator) {
                lineCount++;
                if (lineCount == numberOfLines) {
                    byte[] returnBuffer = new byte[fs.Length - fs.Position];
                    fs.Read(returnBuffer, 0, returnBuffer.Length);
                    return encoding.GetString(returnBuffer);
                }
            }
        }

        // handle case where number of lines in file is less than numberOfLines
        fs.Seek(0, SeekOrigin.Begin);
        buffer = new byte[fs.Length];
        fs.Read(buffer, 0, buffer.Length);
        return encoding.GetString(buffer);
    }
}

登入後複製

以上是如何有效率地檢索海量文字檔的最後10行？的詳細內容。更多資訊請關注PHP中文網其他相關文章！