The following tutorial column will introduce you to golang's efficient processing of large files_Using Pandas to process large files in chunks. I hope it will be helpful to friends in need! Use Pandas to process large files in chunksProblem: Today when processing user data of Kuaishou, I encountered a txt text of almost 600M. It crashed when I opened it with sublime. I used pandas. It took nearly 2 minutes to read with read_table(). Finally, when I opened it, I found almost 30 million rows of data. It's just opening, I don't know how hard it would be to handle.
,
iteratorThe principle is The file data is not read into the memory at once, but multiple times. 1. Specify chunksize to read files in chunks
read_csv and read_table have a chunksize parameter to specify a chunk size (how many lines to read each time) and return an iterable TextFileReader object.
table=pd.read_table(path+'kuaishou.txt',sep='t',chunksize=1000000) for df in table: 对df处理 #如df.drop(columns=['page','video_id'],axis=1,inplace=True) #print(type(df),df.shape)打印看一下信息
2. Specify iterator=True
iterator=True also returns a TextFileReader object
reader = pd.read_table('tmp.sv', sep='t', iterator=True) df=reader.get_chunk(10000) #通过get_chunk(size),返回一个size行的块 #接着同样可以对df处理
The above is the detailed content of How to efficiently process large files in golang. For more information, please follow other related articles on the PHP Chinese website!