How to efficiently process large files in golang-Golang-php.cn

How to efficiently process large files in golang

藏色散人

Release： 2021-05-12 11:52:57

forward

2433 people have browsed it

The following tutorial column will introduce you to golang's efficient processing of large files_Using Pandas to process large files in chunks. I hope it will be helpful to friends in need! Use Pandas to process large files in chunksProblem: Today when processing user data of Kuaishou, I encountered a txt text of almost 600M. It crashed when I opened it with sublime. I used pandas. It took nearly 2 minutes to read with read_table(). Finally, when I opened it, I found almost 30 million rows of data. It's just opening, I don't know how hard it would be to handle.

Solution: I looked through the documentation. This type of function that reads files has two parameters:

chunksize

iterator

The principle is The file data is not read into the memory at once, but multiple times. 1. Specify chunksize to read files in chunks

read_csv and read_table have a chunksize parameter to specify a chunk size (how many lines to read each time) and return an iterable TextFileReader object.

table=pd.read_table(path+&#39;kuaishou.txt&#39;,sep=&#39;t&#39;,chunksize=1000000)
for df in table:
    对df处理
    #如df.drop(columns=[&#39;page&#39;,&#39;video_id&#39;],axis=1,inplace=True)
    #print(type(df),df.shape)打印看一下信息

Copy after login

I have divided the file here again and divided it into several sub-files for separate processing (yes, to_csv also has the chunksize parameter)

2. Specify iterator=True

iterator=True also returns a TextFileReader object

reader = pd.read_table(&#39;tmp.sv&#39;, sep=&#39;t&#39;, iterator=True)
df=reader.get_chunk(10000)
#通过get_chunk(size)，返回一个size行的块
#接着同样可以对df处理

Copy after login

Let’s take a look at the content of the pandas document in this aspect.

The above is the detailed content of How to efficiently process large files in golang. For more information, please follow other related articles on the PHP Chinese website!