Why is np.vectorize() Faster than df.apply() for Pandas Column Creation?

Susan Sarandon
Release: 2024-10-27 04:34:30
Original
846 people have browsed it

  Why is np.vectorize() Faster than df.apply() for Pandas Column Creation?

Performance Comparison of Pandas apply vs np.vectorize

It has been observed that np.vectorize() can be significantly faster than df.apply() when creating a new column based on existing columns in a Pandas DataFrame. The observed performance difference stems from the underlying mechanisms employed by these two methods.

df.apply() vs Python-Level Loops

df.apply() essentially creates a Python-level loop that iterates over each row of the DataFrame. As demonstrated in the provided benchmarks, Python-level loops such as list comprehensions and map are all relatively slow compared to true vectorised calculations.

np.vectorize() vs df.apply()

np.vectorize() converts a user-defined function into a universal function (ufunc). Ufuncs are highly optimised and can perform element-wise operations on NumPy arrays, leveraging C-based code and optimised algorithms. This is in contrast to df.apply(), which operates on Pandas Series objects and incurs additional overhead.

True Vectorisation: Optimal Performance

For truly efficient column creation, vectorised calculations within NumPy are highly recommended. Operations like numpy.where and direct element-wise division with df["A"] / df["B"] are extremely fast and avoid the overheads associated with loops.

Numba Optimisation

For even greater efficiency, it is possible to further optimise loops using Numba, a compiler that translates Python functions into optimised C code. Numba can reduce execution time to microseconds, significantly outperforming both df.apply() and np.vectorize().

Conclusion

While np.vectorize() may offer some improvement over df.apply(), it is not a true substitute for vectorised calculations in NumPy. To achieve maximum performance, utilise Numba optimisation or direct vectorised operations within NumPy for the creation of new columns in Pandas DataFrames.

The above is the detailed content of Why is np.vectorize() Faster than df.apply() for Pandas Column Creation?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!