Home > Backend Development > C++ > How Can I Efficiently Extract the Last 10 Lines from a 10GB Text File in C#?

How Can I Efficiently Extract the Last 10 Lines from a 10GB Text File in C#?

Susan Sarandon
Release: 2024-12-30 06:28:11
Original
174 people have browsed it

How Can I Efficiently Extract the Last 10 Lines from a 10GB  Text File in C#?

Getting the Last 10 Lines of a Massive Text File (Over 10GB): An Efficient C# Approach

When dealing with massive text files exceeding 10GB, extracting the last lines can pose a performance challenge. Here's how to achieve this effectively using C#:

Code Implementation:

This generalized approach allows you to specify the number of tokens to extract (numberOfTokens), the file path (path), the encoding (encoding), and the token separator (tokenSeparator):

public static string ReadEndTokens(string path, Int64 numberOfTokens, Encoding encoding, string tokenSeparator) {

    int sizeOfChar = encoding.GetByteCount("\n");
    byte[] buffer = encoding.GetBytes(tokenSeparator);

    using (FileStream fs = new FileStream(path, FileMode.Open)) {
        Int64 tokenCount = 0;
        Int64 endPosition = fs.Length / sizeOfChar;

        for (Int64 position = sizeOfChar; position < endPosition; position += sizeOfChar) {
            fs.Seek(-position, SeekOrigin.End);
            fs.Read(buffer, 0, buffer.Length);

            if (encoding.GetString(buffer) == tokenSeparator) {
                tokenCount++;
                if (tokenCount == numberOfTokens) {
                    byte[] returnBuffer = new byte[fs.Length - fs.Position];
                    fs.Read(returnBuffer, 0, returnBuffer.Length);
                    return encoding.GetString(returnBuffer);
                }
            }
        }

        // handle case where number of tokens in file is less than numberOfTokens
        fs.Seek(0, SeekOrigin.Begin);
        buffer = new byte[fs.Length];
        fs.Read(buffer, 0, buffer.Length);
        return encoding.GetString(buffer);
    }
}
Copy after login

How It Works:

  1. Calculate the size of a character in the specified encoding.
  2. Seek to the end of the file and start moving backwards, reading the token separator bytes.
  3. Count the tokens encountered and stop when the desired number of tokens is reached.
  4. Read the remaining bytes from the current position to the end of the file.
  5. Handle the case where the number of tokens in the file is less than the expected number.

By leveraging this approach, you can efficiently extract the last lines of massive text files, addressing the challenges posed by their large size.

The above is the detailed content of How Can I Efficiently Extract the Last 10 Lines from a 10GB Text File in C#?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template