How to Effectively Handle Large CSV Files in Python 2.7?-Python Tutorial-php.cn

How to Effectively Handle Large CSV Files in Python 2.7?

Mary-Kate Olsen

Release： 2024-11-08 03:32:02

Original

756 people have browsed it

How to Effectively Handle Large CSV Files in Python 2.7?

Reading Large .csv Files in Python

Problem: Reading massive .csv files (up to 1 million rows, 200 columns) in Python 2.7 encounters memory errors.

The initial approach iterates through the entire file and stores data in memory as lists. However, this method becomes impractical for large files, as it consumes excessive memory.

Solution:

1. Process Rows as They Are Produced:

Avoid loading the entire file into memory. Instead, process rows as they are generated using a generator function.

def getstuff(filename, criterion):
    with open(filename, "rb") as csvfile:
        datareader = csv.reader(csvfile)
        yield next(datareader)  # yield the header row
        for row in datareader:
            if row[3] == criterion:
                yield row

Copy after login

2. Use Generator Functions for Filtering:

Filter data while iterating through the file using generator functions. This approach allows for matching multiple consecutive rows meeting a specific criterion.

def getstuff(filename, criterion):
    with open(filename, "rb") as csvfile:
        datareader = csv.reader(csvfile)
        yield next(datareader)  # yield the header row
        yield from takewhile(
            lambda r: r[3] == criterion,
            dropwhile(lambda r: r[3] != criterion, datareader))
        return

Copy after login

3. Optimize Memory Consumption:

Refactor getdata() to use a generator function as well, ensuring that only one row is held in memory at any time.

def getdata(filename, criteria):
    for criterion in criteria:
        for row in getstuff(filename, criterion):
            yield row

Copy after login

Additional Tips for Speed:

Use csv.reader with a chunk size parameter: Read files in smaller chunks to reduce memory footprint.
Consider using a database engine: If the data fits, store it in a database for faster and more efficient processing.

The above is the detailed content of How to Effectively Handle Large CSV Files in Python 2.7?. For more information, please follow other related articles on the PHP Chinese website!