如何高效检索海量文本文件的最后10行？-C++-PHP中文网

如何高效检索海量文本文件的最后10行？

Mary-Kate Olsen

发布： 2025-01-05 10:23:41

原创

530 人浏览过

How Can I Efficiently Retrieve the Last 10 Lines of a Gigantic Text File?

高效检索海量文本文件的最后 10 行

在大规模文本处理领域，检索超大文件的最后几行会带来独特的挑战。其中一个文件的大小超过 10GB，这对于有效获取此数据构成了重大障碍。

要解决此问题，一种有效的方法是从末尾开始向后遍历文件。我们的目标是找到十个连续的换行符，表明所需行的存在。随后，考虑到潜在的编码变化，我们向前阅读以捕获这些行。

例如，在 C# 中，全面的实现可以处理文件包含少于十行的情况。以下代码片段举例说明了这种方法：

public static string ReadEndLines(string path, Int64 numberOfLines, Encoding encoding, string lineSeparator) {

    int sizeOfChar = encoding.GetByteCount("\n");
    byte[] buffer = encoding.GetBytes(lineSeparator);


    using (FileStream fs = new FileStream(path, FileMode.Open)) {
        Int64 lineCount = 0;
        Int64 endPosition = fs.Length / sizeOfChar;

        for (Int64 position = sizeOfChar; position < endPosition; position += sizeOfChar) {
            fs.Seek(-position, SeekOrigin.End);
            fs.Read(buffer, 0, buffer.Length);

            if (encoding.GetString(buffer) == lineSeparator) {
                lineCount++;
                if (lineCount == numberOfLines) {
                    byte[] returnBuffer = new byte[fs.Length - fs.Position];
                    fs.Read(returnBuffer, 0, returnBuffer.Length);
                    return encoding.GetString(returnBuffer);
                }
            }
        }

        // handle case where number of lines in file is less than numberOfLines
        fs.Seek(0, SeekOrigin.Begin);
        buffer = new byte[fs.Length];
        fs.Read(buffer, 0, buffer.Length);
        return encoding.GetString(buffer);
    }
}

登录后复制

以上是如何高效检索海量文本文件的最后10行？的详细内容。更多信息请关注PHP中文网其他相关文章！