Efficient Retrieval of Last 10 Lines from Massive Text Files
Determining the most efficient approach to extract the last 10 lines from an exceedingly large text file (exceeding 10GB) requires a strategy that minimizes computational overhead.
Utilizing File Positioning and Reverse Seek
The recommended approach is to navigate to the end of the file using the Seek() method and progressively move backward in the file until encountering 10 newlines. By maintaining a line count, the method identifies the precise starting point to read forward and retrieve the desired lines. This strategy efficiently handles files with a varying number of lines, including those with fewer than 10.
Example Implementation in C#
The following C# code demonstrates the implementation of the aforementioned approach, generalized to locate the last numberOfTokens in a file encoded by encoding and separated by tokenSeparator:
public static string ReadEndTokens(string path, Int64 numberOfTokens, Encoding encoding, string tokenSeparator) { int sizeOfChar = encoding.GetByteCount("\n"); byte[] buffer = encoding.GetBytes(tokenSeparator); using (FileStream fs = new FileStream(path, FileMode.Open)) { Int64 tokenCount = 0; Int64 endPosition = fs.Length / sizeOfChar; for (Int64 position = sizeOfChar; position < endPosition; position += sizeOfChar) { fs.Seek(-position, SeekOrigin.End); fs.Read(buffer, 0, buffer.Length); if (encoding.GetString(buffer) == tokenSeparator) { tokenCount++; if (tokenCount == numberOfTokens) { byte[] returnBuffer = new byte[fs.Length - fs.Position]; fs.Read(returnBuffer, 0, returnBuffer.Length); return encoding.GetString(returnBuffer); } } } // handle case where number of tokens in file is less than numberOfTokens fs.Seek(0, SeekOrigin.Begin); buffer = new byte[fs.Length]; fs.Read(buffer, 0, buffer.Length); return encoding.GetString(buffer); } }
By utilizing this technique, the retrieval of the last 10 lines from a large text file is accomplished with minimal memory usage and computational complexity, providing an efficient solution for this common file processing scenario.
The above is the detailed content of How Can I Efficiently Retrieve the Last 10 Lines from a Very Large Text File?. For more information, please follow other related articles on the PHP Chinese website!