The Quest for Performance Part II : Perl vs Python-Python Tutorial-php.cn

The Quest for Performance Part II : Perl vs Python

WBOY

Release： 2024-07-31 12:37:22

Original

734 people have browsed it

The Quest for Performance Part II : Perl vs Python

Having run a toy performance example, we will now digress somewhat and contrast the performance against
a few Python implementations. First let's set up the stage for the calculations, and provide commandline
capabilities to the Python script.

import argparse
import time
import math
import numpy as np
import os
from numba import njit
from joblib import Parallel, delayed

parser = argparse.ArgumentParser()
parser.add_argument("--workers", type=int, default=8)
parser.add_argument("--arraysize", type=int, default=100_000_000)
args = parser.parse_args()
# Set the number of threads to 1 for different libraries
print("=" * 80)
print(
    f"\nStarting the benchmark for {args.arraysize} elements "
    f"using {args.workers} threads/workers\n"
)

# Generate the data structures for the benchmark
array0 = [np.random.rand() for _ in range(args.arraysize)]
array1 = array0.copy()
array2 = array0.copy()
array_in_np = np.array(array1)
array_in_np_copy = array_in_np.copy()

Copy after login

And here are our contestants:

Base Python

  for i in range(len(array0)):
    array0[i] = math.cos(math.sin(math.sqrt(array0[i])))

Copy after login

Numpy (Single threaded)

np.sqrt(array_in_np, out=array_in_np)
np.sin(array_in_np, out=array_in_np)
np.cos(array_in_np, out=array_in_np)

Copy after login

Joblib (note that this example is not a true in-place one, but I have not been able to make it run using the out arguments)

def compute_inplace_with_joblib(chunk):
    return np.cos(np.sin(np.sqrt(chunk))) #parallel function for joblib

chunks = np.array_split(array1, args.workers)  # Split the array into chunks
numresults = Parallel(n_jobs=args.workers)(
        delayed(compute_inplace_with_joblib)(chunk) for chunk in chunks
    )# Process each chunk in a separate thread
array1 = np.concatenate(numresults)  # Concatenate the results

Copy after login

Numba

@njit
def compute_inplace_with_numba(array):
    np.sqrt(array,array)
    np.sin(array,array)
    np.cos(array,array)
    ## njit will compile this function to machine code
compute_inplace_with_numba(array_in_np_copy)

Copy after login

And here are the timing results:

In place in (  base Python): 11.42 seconds
In place in (Python Joblib): 4.59 seconds
In place in ( Python Numba): 2.62 seconds
In place in ( Python Numpy): 0.92 seconds

Copy after login

The numba is surprisingly slower!? Could it be due to the overhead of compilation as pointed out by mohawk2 in an IRC exchange about this issue?
To test this, we should call compute_inplace_with_numba once before we execute the benchmark. Doing so, shows that Numba is now faster than Numpy.

In place in (  base Python): 11.89 seconds
In place in (Python Joblib): 4.42 seconds
In place in ( Python Numpy): 0.93 seconds
In place in ( Python Numba): 0.49 seconds

Copy after login

Finally, I decided to take base R for ride in the same example:

n<-50000000
x<-runif(n)
start_time <- Sys.time()
result <- cos(sin(sqrt(x)))
end_time <- Sys.time()

# Calculate the time taken
time_taken <- end_time - start_time

# Print the time taken
print(sprintf("Time in base R: %.2f seconds", time_taken))

Copy after login

which yielded the following timing result:

Time in base R: 1.30 seconds

Copy after login

Compared to the Perl results we note the following about this example:

Inplace operations in base Python were ~ 3.5 slower than Perl
Single threaded PDL and numpy gave nearly identical results, followed closely by base R
Failure to account for the compilation overhead of Numba yields the false impression that it is slower than Numpy. When accounting for the compilation overhead, Numba is x2 faster than Numpy
Parallelization with Joblib did improve upon base Python, but was still inferior to the single thread Perl implementation
Multi-threaded PDL (and OpenMP) crushed (not crashed!) every other implementation in all lanugages). Hopefully this post provides some food for thought about the language to use for your next data/compute intensive operation. The next part in this series will look into the same example using arrays in C. This final installment will (hopefully) provide some insights about the impact of memory locality and the overhead incurred by using dynamically typed languages.

The above is the detailed content of The Quest for Performance Part II : Perl vs Python. For more information, please follow other related articles on the PHP Chinese website!