在 Python 中讀取大型 CSV 檔案時如何處理記憶體問題？-Python教學-PHP中文網

在 Python 中讀取大型 CSV 檔案時如何處理記憶體問題？

Mary-Kate Olsen

發布： 2024-11-09 05:07:02

原創

517 人瀏覽過

How to Handle Memory Issues When Reading Large CSV Files in Python?

在Python 中讀取海量CSV 檔案

在Python 2.7 中，使用者在讀取數百萬行和數百個CSV文件時常會遇到記憶體問題。列。本文解決了這些挑戰，並提供了有效處理大型 CSV 檔案的解決方案。

原始程式碼和問題

提供的程式碼旨在從基於 CSV 檔案讀取特定行根據給定的標準。但是，它在處理之前將所有行載入到列表中，這會導致超過 300,000 行的檔案出現記憶體錯誤。

解決方案 1：增量處理行

要消除記憶體問題，增量處理行而不是將它們儲存在列表中至關重要。可以使用生成器函數來實現此目的：

def getstuff(filename, criterion):
    with open(filename, "rb") as csvfile:
        datareader = csv.reader(csvfile)
        yield next(datareader)  # yield the header row
        for row in datareader:
            if row[3] == criterion:
                yield row

登入後複製

此函數產生符合條件的標題行和後續行，然後停止讀取。

解決方案2：最佳化過濾

或者，可以採用更簡潔的過濾方法：

def getstuff(filename, criterion):
    with open(filename, "rb") as csvfile:
        datareader = csv.reader(csvfile)
        yield next(datareader)  # yield the header row
        yield from takewhile(
            lambda r: r[3] == criterion,
            dropwhile(lambda r: r[3] != criterion, datareader))

登入後複製

此方法使用itertools 模組中的takewhile 和dropwhile 函數來過濾行。

更新的程式碼

在getdata 函數中，列表理解被替換為生成器理解：

def getdata(filename, criteria):
    for criterion in criteria:
        for row in getstuff(filename, criterion):
            yield row

登入後複製

結論結論

透過使用生成器函數和最佳化過濾技術，可以有效地處理大型CSV 文件，避免記憶體錯誤並顯著提高效能。

以上是在 Python 中讀取大型 CSV 檔案時如何處理記憶體問題？的詳細內容。更多資訊請關注PHP中文網其他相關文章！