Retrieving the Last 10 Lines of Massive Text Files (Over 10GB)
In the realm of text processing, a common challenge is extracting the last few lines of extremely large text files. When dealing with files exceeding 10GB, traditional approaches may fall short. This article presents an effective solution to this problem using C# and offers a code snippet to demonstrate its implementation.
To efficiently retrieve the last 10 lines, the strategy involves traversing the file backwards from the end. As the number of lines may be variable, we iteratively seek backwards until 10 line breaks are encountered. Once this point is reached, we read the remaining content forward to capture the last 10 lines.
Consider the following implementation:
public static string ReadEndTokens(string path, Int64 numberOfTokens, Encoding encoding, string tokenSeparator) { int sizeOfChar = encoding.GetByteCount("\n"); byte[] buffer = encoding.GetBytes(tokenSeparator); using (FileStream fs = new FileStream(path, FileMode.Open)) { Int64 tokenCount = 0; Int64 endPosition = fs.Length / sizeOfChar; for (Int64 position = sizeOfChar; position < endPosition; position += sizeOfChar) { fs.Seek(-position, SeekOrigin.End); fs.Read(buffer, 0, buffer.Length); if (encoding.GetString(buffer) == tokenSeparator) { tokenCount++; if (tokenCount == numberOfTokens) { byte[] returnBuffer = new byte[fs.Length - fs.Position]; fs.Read(returnBuffer, 0, returnBuffer.Length); return encoding.GetString(returnBuffer); } } } // handle case where number of tokens in file is less than numberOfTokens fs.Seek(0, SeekOrigin.Begin); buffer = new byte[fs.Length]; fs.Read(buffer, 0, buffer.Length); return encoding.GetString(buffer); } }
This code handles cases where the number of lines in the file is less than 10 and appropriately adjusts the read operation. The encoding parameter allows for customization based on the file's encoding, and tokenSeparator can be used to retrieve the last consecutive elements of a different separator.
By employing this approach, you can effectively retrieve the last 10 lines of massive text files, ensuring efficient processing and accurate results.
The above is the detailed content of How to Efficiently Read the Last 10 Lines of a 10GB Text File in C#?. For more information, please follow other related articles on the PHP Chinese website!