Reading Vast CSV Files in Python
In Python 2.7, users often encounter memory issues when reading CSV files with millions of rows and hundreds of columns. This article addresses these challenges and offers solutions to process large CSV files effectively.
Original Code and Issues
The provided code aims to read specific rows from a CSV file based on a given criterion. However, it loads all rows into a list before processing, which leads to memory errors for files exceeding 300,000 rows.
Solution 1: Process Rows Incrementally
To eliminate the memory issue, it is crucial to process rows incrementally instead of storing them in a list. A generator function can be used to achieve this:
def getstuff(filename, criterion): with open(filename, "rb") as csvfile: datareader = csv.reader(csvfile) yield next(datareader) # yield the header row for row in datareader: if row[3] == criterion: yield row
This function yields the header row and subsequent rows that match the criterion, and then stops reading.
Solution 2: Optimized Filtering
Alternatively, a more concise filtering method can be employed:
def getstuff(filename, criterion): with open(filename, "rb") as csvfile: datareader = csv.reader(csvfile) yield next(datareader) # yield the header row yield from takewhile( lambda r: r[3] == criterion, dropwhile(lambda r: r[3] != criterion, datareader))
This method uses the takewhile and dropwhile functions from the itertools module to filter the rows.
Updated Code
In the getdata function, the list comprehension is replaced with a generator comprehension:
def getdata(filename, criteria): for criterion in criteria: for row in getstuff(filename, criterion): yield row
Conclusion
By using generator functions and optimizing filtering techniques, it is possible to process large CSV files effectively, avoiding memory errors and significantly improving performance.
The above is the detailed content of How to Handle Memory Issues When Reading Large CSV Files in Python?. For more information, please follow other related articles on the PHP Chinese website!