Home Backend Development Python Tutorial How to speed up frequent writing of files in Python

How to speed up frequent writing of files in Python

Jun 26, 2019 pm 02:52 PM

How to speed up frequent writing of files in Python

Problem background: There are a batch of files that need to be processed. For each file, the same function needs to be called for processing, which is quite time-consuming.

Is there any way to speed it up? Of course there is. For example, if you divide these files into several batches, each batch calls a python script you wrote for processing, so that running several python programs at the same time can also be accelerated.

Is there an easier way? For example, a program I run is divided into multiple threads at the same time and then processed?

General idea: Divide these lists of file paths into several parts. As for how many parts to divide, it depends on how many CPU cores you have. For example, if your CPU has 32 cores, it can theoretically be accelerated by 32 times.

The code is as follows:

# -*-coding:utf-8-*-
import numpy as np
from glob import glob
import math
import os
import torch
from tqdm import tqdm
import multiprocessing
label_path = '/home/ying/data/shiyongjie/distortion_datasets/new_distortion_dataset/train/label.txt'
file_path = '/home/ying/data/shiyongjie/distortion_datasets/new_distortion_dataset/train/distortion_image'
save_path = '/home/ying/data/shiyongjie/distortion_datasets/new_distortion_dataset/train/flow_field'
r_d_max = 128
image_index = 0
txt_file = open(label_path)
file_list = txt_file.readlines()
txt_file.close()
file_label = {}
for i in file_list:
    i = i.split()
    file_label[i[0]] = i[1]
r_d_max = 128
eps = 1e-32
H = 256
W = 256
def generate_flow_field(image_list):
    for image_file_path in ((image_list)):
        pixel_flow = np.zeros(shape=tuple([256, 256, 2]))  # 按照pytorch中的grid来写
        image_file_name = os.path.basename(image_file_path)
        # print(image_file_name)
        k = float(file_label[image_file_name])*(-1)*1e-7
        # print(k)
        r_u_max = r_d_max/(1+k*r_d_max**2)  # 计算出畸变校正之后的对角线的理论长度
        scale = r_u_max/128  # 将这个长度压缩到256的尺寸,会有一个scale,实际上这里写128*sqrt(2)可能会更加直观
        for i_u in range(256):
            for j_u in range(256):
                x_u = float(i_u - 128)
                y_u = float(128 - j_u)
                theta = math.atan2(y_u, x_u)
                r = math.sqrt(x_u ** 2 + y_u ** 2)
                r = r * scale  # 实际上得到的r,即没有resize到256×256的图像尺寸size,并且带入公式中
                r_d = (1.0 - math.sqrt(1 - 4.0 * k * r ** 2)) / (2 * k * r + eps)  # 对应在原图(畸变图)中的r
                x_d = int(round(r_d * math.cos(theta)))
                y_d = int(round(r_d * math.sin(theta)))
                i_d = int(x_d + W / 2.0)
                j_d = int(H / 2.0 - y_d)
                if i_d < W and i_d >= 0 and j_d < H and j_d >= 0:  # 只有求的的畸变点在原图中的时候才进行赋值
                    value1 = (i_d - 128.0)/128.0
                    value2 = (j_d - 128.0)/128.0
                    pixel_flow[j_u, i_u, 0] = value1  # mesh中存储的是对应的r的比值,在进行畸变校正的时候,给定一张这样的图,进行找像素即可
                    pixel_flow[j_u, i_u, 1] = value2
# 保存成array格式
        saved_image_file_path = os.path.join(save_path, image_file_name.split(&#39;.&#39;)[0] + &#39;.npy&#39;)
        pixel_flow = pixel_flow.astype(&#39;f2&#39;)  # 将数据的格式转换成float16类型, 节省空间
        # print(saved_image_file_path)
        # print(pixel_flow)
        np.save(saved_image_file_path, pixel_flow)
    return
if __name__ == &#39;__main__&#39;:
    file_list = glob(file_path + &#39;/*.JPEG&#39;)
    m = 32
    n = int(math.ceil(len(file_list) / float(m)))  # 向上取整
    result = []
    pool = multiprocessing.Pool(processes=m)  # 32进程
    for i in range(0, len(file_list), n):
        result.append(pool.apply_async(generate_flow_field, (file_list[i: i+n],)))
    pool.close()
    pool.join()
Copy after login

In the above code, the function

generate_flow_field(image_list)

needs to pass in a list, and then for Operate this list, and then save the results of the operation

So, you only need to divide the multiple files you need to process into lists of equal sizes as much as possible, and then open a thread for each list. Just process it

The above main function:

if __name__ == &#39;__main__&#39;:
    file_list = glob(file_path + &#39;/*.JPEG&#39;)  # 将文件夹下所有的JPEG文件列成一个list
    m = 32  # 假设CPU有32个核心
    n = int(math.ceil(len(file_list) / float(m)))  # 每一个核心需要处理的list的数目
    result = []
    pool = multiprocessing.Pool(processes=m)  # 开32线程的线程池
    for i in range(0, len(file_list), n):
        result.append(pool.apply_async(generate_flow_field, (file_list[i: i+n],)))  # 对每一个list都用上面我们定义的函数进行处理
    pool.close()  # 处理结束之后,关闭线程池
    pool.join()
Copy after login

It is mainly two lines of code, one line is

pool = multiprocessing.Pool(processes=m)  # 开32线程的线程池
Copy after login

used to open up the thread pool

In addition One line is

result.append(pool.apply_async(generate_flow_field, (file_list[i: i+n],)))  # 对每一个list都用上面我们定义的函数进行处理
Copy after login

For the thread pool, use apply_async() to run the generate_flow_field function at the same time. The parameters passed in are: file_list[i: i n]

In fact, the function of apply_async() All threads run at the same time, so the speed is relatively fast.

For more Python related technical articles, please visit the Python Tutorial column to learn!

The above is the detailed content of How to speed up frequent writing of files in Python. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to solve the permissions problem encountered when viewing Python version in Linux terminal? How to solve the permissions problem encountered when viewing Python version in Linux terminal? Apr 01, 2025 pm 05:09 PM

Solution to permission issues when viewing Python version in Linux terminal When you try to view Python version in Linux terminal, enter python...

How to efficiently copy the entire column of one DataFrame into another DataFrame with different structures in Python? How to efficiently copy the entire column of one DataFrame into another DataFrame with different structures in Python? Apr 01, 2025 pm 11:15 PM

When using Python's pandas library, how to copy whole columns between two DataFrames with different structures is a common problem. Suppose we have two Dats...

How to teach computer novice programming basics in project and problem-driven methods within 10 hours? How to teach computer novice programming basics in project and problem-driven methods within 10 hours? Apr 02, 2025 am 07:18 AM

How to teach computer novice programming basics within 10 hours? If you only have 10 hours to teach computer novice some programming knowledge, what would you choose to teach...

How does Uvicorn continuously listen for HTTP requests without serving_forever()? How does Uvicorn continuously listen for HTTP requests without serving_forever()? Apr 01, 2025 pm 10:51 PM

How does Uvicorn continuously listen for HTTP requests? Uvicorn is a lightweight web server based on ASGI. One of its core functions is to listen for HTTP requests and proceed...

How to dynamically create an object through a string and call its methods in Python? How to dynamically create an object through a string and call its methods in Python? Apr 01, 2025 pm 11:18 PM

In Python, how to dynamically create an object through a string and call its methods? This is a common programming requirement, especially if it needs to be configured or run...

How to avoid being detected by the browser when using Fiddler Everywhere for man-in-the-middle reading? How to avoid being detected by the browser when using Fiddler Everywhere for man-in-the-middle reading? Apr 02, 2025 am 07:15 AM

How to avoid being detected when using FiddlerEverywhere for man-in-the-middle readings When you use FiddlerEverywhere...

What are some popular Python libraries and their uses? What are some popular Python libraries and their uses? Mar 21, 2025 pm 06:46 PM

The article discusses popular Python libraries like NumPy, Pandas, Matplotlib, Scikit-learn, TensorFlow, Django, Flask, and Requests, detailing their uses in scientific computing, data analysis, visualization, machine learning, web development, and H

See all articles