goodbye! Python loops, vectorization is amazing

WBOY
Release: 2023-04-14 14:49:03
forward
1148 people have browsed it

We have learned about loops in almost all programming languages. So, by default, whenever there is a repetitive operation, we start implementing loops. But when we're dealing with a lot of iterations (millions/billions of rows), using loops is a real pain, and you might get stuck for hours, only to realize later that it doesn't work. This is where implementing vectorization in Python becomes super critical.

What is vectorization?

Vectorization is a technique for implementing (NumPy) array operations on data sets. Behind the scenes, it operates on all elements of the array or series at once (unlike a 'for' loop, which operates one row at a time).

In this blog, we will look at some use cases where we can easily replace Python loops with vectorization. This will help you save time and become more proficient at coding.

Use Case 1: Finding the Sum of Numbers

First, let’s look at a basic example of finding the sum of numbers in Python using loops and vectors.

Using loops

import time 
start = time.time()

# 遍历之和
total = 0
# 遍历150万个数字
for item in range(0, 1500000):
total = total + item

print('sum is:' + str(total))
end = time.time()

print(end - start)

#1124999250000
#0.14 Seconds
Copy after login

Using vectorization

import numpy as np

start = time.time()

# 向量化和--使用numpy进行向量化
# np.range创建从0到1499999的数字序列
print(np.sum(np.arange(1500000)))

end = time.time()
print(end - start)

##1124999250000
##0.008 Seconds
Copy after login

Execution of vectorization compared to iteration using range functions The time is about 18 times. This difference becomes even more apparent when working with Pandas DataFrame.

Use Case 2: DataFrame Mathematical Operations

In data science, when using Pandas DataFrame, developers use loops to create new derived columns for mathematical operations.

In the example below, we can see that in such use cases, loops can easily be replaced by vectorization.

Create DataFrame

DataFrame is tabular data in the form of rows and columns.

We are creating a pandas DataFrame with 5 million rows and 4 columns filled with random values ​​between 0 and 50.

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0, 50, 
size=(5000000, 4)),
columns=('a','b','c','d'))
df.shape
# (5000000, 5)
df.head()
Copy after login

goodbye! Python loops, vectorization is amazing

We will create a new column 'ratio' to find the ratio of columns 'd' and 'c'.

Using loops

import time 
start = time.time()

# Iterating through DataFrame using iterrows
for idx, row in df.iterrows():
# creating a new column 
df.at[idx,'ratio'] = 100 * (row["d"] / row["c"])
end = time.time()
print(end - start)
### 109 Seconds
Copy after login

Using vectorization

start = time.time()
df["ratio"] = 100 * (df["d"] / df["c"])

end = time.time()
print(end - start)
### 0.12 seconds
Copy after login

We can see that there is a significant improvement in DataFrame, with python Compared to the loop in , vectorization is almost 1000 times faster.

Use case 3: If-else statement on DataFrame

We have implemented many operations that require us to use "if-else" type logic. We can easily replace this logic with vectorized operations in python.

Have a look at the example below to understand it better (we will use the DataFrame created in use case 2).

Imagine how to create a new column 'e' based on some conditions of the exited column 'a'.

Using loops

import time 
start = time.time()

# Iterating through DataFrame using iterrows
for idx, row in df.iterrows():
if row.a == 0:
df.at[idx,'e'] = row.d
elif (row.a <= 25) & (row.a > 0):
df.at[idx,'e'] = (row.b)-(row.c)
else:
df.at[idx,'e'] = row.b + row.c

end = time.time()

print(end - start)
### Time taken: 177 seconds
Copy after login

Using vectorization

start = time.time()
df['e'] = df['b'] + df['c']
df.loc[df['a'] <= 25, 'e'] = df['b'] -df['c']
df.loc[df['a']==0, 'e'] = df['d']end = time.time()
print(end - start)
## 0.28007707595825195 sec
Copy after login

Compared to python loops with if-else statements, Vectorized operations are 600 times faster than loops.

Use Case 4: Solving Machine Learning/Deep Learning Networks

Deep learning requires us to solve multiple complex equations, and for millions and billions of rows of equations. Running loops in Python to solve these equations is very slow, at which point vectorization is the best solution.

For example, you want to calculate the y values ​​for millions of rows in the following multiple linear regression equation.

goodbye! Python loops, vectorization is amazing

We can use vectorization instead of looping.

The values ​​of m1,m2,m3... are determined by solving the above equation using millions of values ​​corresponding to x1,x2,x3... (for simplicity, only look at one Simple multiplication steps)

Create data

>>> import numpy as np
>>> # 设置 m 的初始值 
>>> m = np.random.rand(1,5)
array([[0.49976103, 0.33991827, 0.60596021, 0.78518515, 0.5540753]])
>>> # 500万行的输入值
>>> x = np.random.rand(5000000,5)
Copy after login

goodbye! Python loops, vectorization is amazing

##Use a loop

import numpy as np
m = np.random.rand(1,5)
x = np.random.rand(5000000,5)

total = 0
tic = time.process_time()

for i in range(0,5000000):
total = 0
for j in range(0,5):
total = total + x[i][j]*m[0][j] 

zer[i] = total 

toc = time.process_time()
print ("Computation time = " + str((toc - tic)) + "seconds")

####Computation time = 28.228 seconds
Copy after login

Matrix multiplication of vectors is implemented in the backend using vectorization

goodbye! Python loops, vectorization is amazing

tic = time.process_time()

#dot product 
np.dot(x,m.T) 

toc = time.process_time()
print ("Computation time = " + str((toc - tic)) + "seconds")

####Computation time = 0.107 seconds
Copy after login

np.dot. It's 165 times faster compared to loops in python.

Written at the end

Vectorization in Python is very fast. When dealing with very large data sets, it is recommended that you should give priority to vectorization instead of loops. In this way, over time, you will gradually become accustomed to writing code according to vectorization ideas.

The above is the detailed content of goodbye! Python loops, vectorization is amazing. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:51cto.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template