How to Efficiently Retrieve the Last 10 Lines from Massive Text Files (> 10 GB)
Displaying the final 10 lines of an extensive text file can be challenging, especially when the file exceeds 10 gigabytes in size. For an efficient solution, consider the following approach:
Initially, traverse the file to its end. Next, systematically move backward, searching for 10 consecutive newlines. Once found, read forward to the conclusion, paying attention to different character encodings.
Handle scenarios where the file has fewer than 10 lines by appropriately adjusting the process. Below is an implementation in C#:
public static string ReadEndTokens(string path, Int64 numberOfTokens, Encoding encoding, string tokenSeparator) { int sizeOfChar = encoding.GetByteCount("\n"); byte[] buffer = encoding.GetBytes(tokenSeparator); using (FileStream fs = new FileStream(path, FileMode.Open)) { Int64 tokenCount = 0; Int64 endPosition = fs.Length / sizeOfChar; for (Int64 position = sizeOfChar; position < endPosition; position += sizeOfChar) { fs.Seek(-position, SeekOrigin.End); fs.Read(buffer, 0, buffer.Length); if (encoding.GetString(buffer) == tokenSeparator) { tokenCount++; if (tokenCount == numberOfTokens) { byte[] returnBuffer = new byte[fs.Length - fs.Position]; fs.Read(returnBuffer, 0, returnBuffer.Length); return encoding.GetString(returnBuffer); } } } // Handle the case where the file has fewer than numberOfTokens lines fs.Seek(0, SeekOrigin.Begin); buffer = new byte[fs.Length]; fs.Read(buffer, 0, buffer.Length); return encoding.GetString(buffer); } }
This method dynamically adjusts to the actual number of tokens in the file, making it effective for files with varying line counts.
The above is the detailed content of How to Efficiently Read the Last 10 Lines of a Large Text File?. For more information, please follow other related articles on the PHP Chinese website!