In my opinion, the python community is divided into three schools, namely the python 2.x organization, the 3.x organization and the PyPy organization. This classification basically boils down to library compatibility and speed. This article will focus on some common code optimization techniques and the significant performance improvement after compiling into C. Of course, I will also give the running time of the three main python genres. My goal is not to prove one is better than the other, just to give you an idea of how to compare using these specific examples in different contexts.
Using generator
A commonly overlooked memory optimization is the use of generators. Generators allow us to create a function that returns only one record at a time instead of all records at once. If you are using python2.x, this is why you use xrange instead of range or ifilter instead of filter. A good example is creating a large list and stitching them together.
import timeit import random def generate(num): while num: yield random.randrange(10) num -= 1 def create_list(num): numbers = [] while num: numbers.append(random.randrange(10)) num -= 1 return numbers print(timeit.timeit("sum(generate(999))", setup="from __main__ import generate", number=1000)) >>> 0.88098192215 #Python 2.7 >>> 1.416813850402832 #Python 3.2 print(timeit.timeit("sum(create_list(999))", setup="from __main__ import create_list", number=1000)) >>> 0.924163103104 #Python 2.7 >>> 1.5026731491088867 #Python 3.2
Not only is this a little faster, it also prevents you from storing the entire list in memory!
Introduction to Ctypes
For key performance codes, python itself also provides us with an API to call C methods, mainly through ctypes. You can use ctypes without writing any C code. By default, python provides a precompiled standard c library. Let's go back to the generator example and see how much time it takes to implement it using ctypes.
import timeit from ctypes import cdll def generate_c(num): #Load standard C library libc = cdll.LoadLibrary("libc.so.6") #Linux #libc = cdll.msvcrt #Windows while num: yield libc.rand() % 10 num -= 1 print(timeit.timeit("sum(generate_c(999))", setup="from __main__ import generate_c", number=1000)) >>> 0.434374809265 #Python 2.7 >>> 0.7084300518035889 #Python 3.2
Just replaced it with a random function of c, and the running time was reduced by more than half! Now if I tell you we can do better, would you believe it?
Introduction to Cython
Cython is a superset of python that allows us to call C functions and declare variables to improve performance. We need to install Cython before trying to use it.
sudo pip install cython
Cython is essentially a fork of another Pyrex-like library that is no longer under development. It compiles our Python-like code into a C library that we can call in a python file. Use the .pyx suffix instead of the .py suffix for your python files. Let's look at how to run our generator code using Cython.
#cython_generator.pyx import random def generate(num): while num: yield random.randrange(10) num -= 1
We need to create a setup.py so that we can get Cython to compile our function.
from distutils.core import setup from distutils.extension import Extension from Cython.Distutils import build_ext setup( cmdclass = {'build_ext': build_ext}, ext_modules = [Extension("generator", ["cython_generator.pyx"])] )
Compiled using:
python setup.py build_ext --inplace
You should be able to see two files: cython_generator.c file and generator.so file. We use the following method to test our program:
import timeit print(timeit.timeit("sum(generator.generate(999))", setup="import generator", number=1000)) >>> 0.835658073425
Not bad, let's see if there's anything we can improve on. We can first declare "num" as an integer, and then we can import the standard C library to be responsible for our random function.
#cython_generator.pyx cdef extern from "stdlib.h": int c_libc_rand "rand"() def generate(int num): while num: yield c_libc_rand() % 10 num -= 1
If we compile and run again we will see this amazing number.
>>> 0.033586025238
Just a few changes brought decent results. However, sometimes this change is tedious, so let's see how to do it using regular python.
Introduction to PyPyPyPy is a just-in-time compiler for Python 2.7.3. In layman’s terms, this means making your code run faster. Quora uses PyPy in production. PyPy has some installation instructions on their download page, but if you are using Ubuntu, you can install it via apt-get. The way it works is out of the box, so no crazy bash or running scripts, just download and run. Let's see how our original generator code performs under PyPy.
import timeit import random def generate(num): while num: yield random.randrange(10) num -= 1 def create_list(num): numbers = [] while num: numbers.append(random.randrange(10)) num -= 1 return numbers print(timeit.timeit("sum(generate(999))", setup="from __main__ import generate", number=1000)) >>> 0.115154981613 #PyPy 1.9 >>> 0.118431091309 #PyPy 2.0b1 print(timeit.timeit("sum(create_list(999))", setup="from __main__ import create_list", number=1000)) >>> 0.140175104141 #PyPy 1.9 >>> 0.140514850616 #PyPy 2.0b1
Wow! Without modifying a single line of code, the running speed is 8 times faster than the pure python implementation.
Further testingWhy further research? PyPy is the champ! Not entirely true. Although most programs can run on PyPy, some libraries are not fully supported. Moreover, it is easier to write C extensions for your project than to change compilers. Let's dig a little deeper and see how ctypes allows us to write libraries in C. Let’s test the speed of merge sort and calculating the Fibonacci sequence. The following is the C code (functions.c) we will use:
/* functions.c */ #include <stdio.h> #include <stdlib.h> #include <string.h> /* http://rosettacode.org/wiki/Sorting_algorithms/Merge_sort#C */ inline void merge (int *left, int l_len, int *right, int r_len, int *out) { int i, j, k; for (i = j = k = 0; i < l_len && j < r_len;) out[k++] = left[i] < right[j] ? left[i++] : right[j++]; while (i < l_len) out[k++] = left[i++]; while (j < r_len) out[k++] = right[j++]; } /* inner recursion of merge sort */ void recur (int *buf, int *tmp, int len) { int l = len / 2; if (len <= 1) return; /* note that buf and tmp are swapped */ recur (tmp, buf, l); recur (tmp + l, buf + l, len - l); merge (tmp, l, tmp + l, len - l, buf); } /* preparation work before recursion */ void merge_sort (int *buf, int len) { /* call alloc, copy and free only once */ int *tmp = malloc (sizeof (int) * len); memcpy (tmp, buf, sizeof (int) * len); recur (buf, tmp, len); free (tmp); } int fibRec (int n) { if (n < 2) return n; else return fibRec (n - 1) + fibRec (n - 2); }
On the Linux platform, we can compile it into a shared library using the following method:
gcc -Wall -fPIC -c functions.c gcc -shared -o libfunctions.so functions.o
Using ctypes, you can use this library by loading the "libfunctions.so" shared library, just like we did with the standard C library earlier. Here we are going to compare the Python implementation and the C implementation. Now we start to calculate the Fibonacci sequence:
# functions.py from ctypes import * import time libfunctions = cdll.LoadLibrary("./libfunctions.so") def fibRec(n): if n < 2: return n else: return fibRec(n-1) + fibRec(n-2) start = time.time() fibRec(32) finish = time.time() print("Python: " + str(finish - start)) # C Fibonacci start = time.time() x = libfunctions.fibRec(32) finish = time.time() print("C: " + str(finish - start))
As we expected, C is faster than Python and PyPy. We can also compare merge sorts in the same way.
We haven't dug into the Cypes library yet, so these examples do not reflect the powerful side of python. The Cypes library has only a few standard type restrictions, such as int, char array, float, bytes, etc. By default, there is no integer array, however by multiplying with c_int (ctype is int type) we can obtain such an array indirectly. This is also what line 7 of the code is showing. We created a c_int array, an array of our numbers and packed them into a c_int array
主要的是c语言不能这样做,而且你也不想。我们用指针来修改函数体。为了通过我们的c_numbers的数列,我们必须通过引用传递merge_sort功能。运行merge_sort后,我们利用c_numbers数组进行排序,我已经把下面的代码加到我的functions.py文件中了。
#Python Merge Sort from random import shuffle, sample #Generate 9999 random numbers between 0 and 100000 numbers = sample(range(100000), 9999) shuffle(numbers) c_numbers = (c_int * len(numbers))(*numbers) from heapq import merge def merge_sort(m): if len(m) <= 1: return m middle = len(m) // 2 left = m[:middle] right = m[middle:] left = merge_sort(left) right = merge_sort(right) return list(merge(left, right)) start = time.time() numbers = merge_sort(numbers) finish = time.time() print("Python: " + str(finish - start)) #C Merge Sort start = time.time() libfunctions.merge_sort(byref(c_numbers), len(numbers)) finish = time.time() print("C: " + str(finish - start))
Python: 0.190635919571 #Python 2.7 Python: 0.11785483360290527 #Python 3.2 Python: 0.266992092133 #PyPy 1.9 Python: 0.265724897385 #PyPy 2.0b1 C: 0.00201296806335 #Python 2.7 + ctypes C: 0.0019741058349609375 #Python 3.2 + ctypes C: 0.0029308795929 #PyPy 1.9 + ctypes C: 0.00287103652954 #PyPy 2.0b1 + ctypes
这儿通过表格和图标来比较不同的结果。
Merge Sort | Fibonacci | |
---|---|---|
Python 2.7 | 0.191 | 1.187 |
Python 2.7 + ctypes | 0.002 | 0.044 |
Python 3.2 | 0.118 | 1.272 |
Python 3.2 + ctypes | 0.002 | 0.046 |
PyPy 1.9 | 0.267 | 0.564 |
PyPy 1.9 + ctypes | 0.003 | 0.048 |
PyPy 2.0b1 | 0.266 | 0.567 |
PyPy 2.0b1 + ctypes | 0.003 | 0.046 |
The above is the detailed content of Efficient Python code. For more information, please follow other related articles on the PHP Chinese website!