Preface
In fact, the internal implementation of Python's list (list) is an array, which is a linear list. To find elements in a list, you can use the list.index() method, whose time complexity is O(n). For large amounts of data, binary search can be used for optimization.
Binary search requires that objects must be ordered. The basic principles are as follows:
1. Start from the middle element of the array. If the middle element happens to be the element to be searched, the search process ends;
2. If a specific element is greater than or less than the middle element, search in the half of the array that is greater than or less than the middle element, and start the comparison from the middle element as before.
3. If the array is empty at a certain step, it means it cannot be found.
Binary search also becomes a binary search. Each comparison of the algorithm reduces the search range by half, and its time complexity is O(logn).
We use recursion and looping to implement binary search respectively:
def binary_search_recursion(lst, value, low, high): if high < low: return None mid = (low + high) / 2 if lst[mid] > value: return binary_search_recursion(lst, value, low, mid-1) elif lst[mid] < value: return binary_search_recursion(lst, value, mid+1, high) else: return mid def binary_search_loop(lst,value): low, high = 0, len(lst)-1 while low <= high: mid = (low + high) / 2 if lst[mid] < value: low = mid + 1 elif lst[mid] > value: high = mid - 1 else: return mid return None
Then perform a performance test on these two implementations:
if __name__ == "__main__": import random lst = [random.randint(0, 10000) for _ in xrange(100000)] lst.sort() def test_recursion(): binary_search_recursion(lst, 999, 0, len(lst)-1) def test_loop(): binary_search_loop(lst, 999) import timeit t1 = timeit.Timer("test_recursion()", setup="from __main__ import test_recursion") t2 = timeit.Timer("test_loop()", setup="from __main__ import test_loop") print "Recursion:", t1.timeit() print "Loop:", t2.timeit()
The execution results are as follows:
Recursion: 3.12596702576 Loop: 2.08254289627
It can be seen that the loop method is more efficient than recursion.
bisect module
Python has a bisect module for maintaining ordered lists. The bisect module implements an algorithm for inserting elements into an ordered list. In some cases, this is more efficient than sorting the list repeatedly or constructing a large list and then sorting it. Bisect means bisection. The bisection method is used here to sort. It will insert an element into the appropriate position of an ordered list, which eliminates the need to call sort every time to maintain the ordered list.
The following is a simple usage example:
import bisect import random random.seed(1) print'New Pos Contents' print'--- --- --------' l = [] for i in range(1, 15): r = random.randint(1, 100) position = bisect.bisect(l, r) bisect.insort(l, r) print'%3d %3d' % (r, position), l
Output result:
New Pos Contents --- --- -------- 14 0 [14] 85 1 [14, 85] 77 1 [14, 77, 85] 26 1 [14, 26, 77, 85] 50 2 [14, 26, 50, 77, 85] 45 2 [14, 26, 45, 50, 77, 85] 66 4 [14, 26, 45, 50, 66, 77, 85] 79 6 [14, 26, 45, 50, 66, 77, 79, 85] 10 0 [10, 14, 26, 45, 50, 66, 77, 79, 85] 3 0 [3, 10, 14, 26, 45, 50, 66, 77, 79, 85] 84 9 [3, 10, 14, 26, 45, 50, 66, 77, 79, 84, 85] 44 4 [3, 10, 14, 26, 44, 45, 50, 66, 77, 79, 84, 85] 77 9 [3, 10, 14, 26, 44, 45, 50, 66, 77, 77, 79, 84, 85] 1 0 [1, 3, 10, 14, 26, 44, 45, 50, 66, 77, 77, 79, 84, 85]
The functions provided by the Bisect module are:
bisect.bisect_left(a,x, lo=0, hi=len(a)):
Find the index of inserting x in the ordered list a. lo and hi are used to specify the range of the list, and the default is to use the entire list. If x already exists, insert it to the left. The return value is index.
bisect.bisect_right(a,x, lo=0, hi=len(a))
bisect.bisect(a, x,lo=0, hi=len( a)):
These two functions are similar to bisect_left, but if x already exists, insert it to the right.
bisect.insort_left(a,x, lo=0, hi=len(a)):
Insert x into the ordered list a. The effect is the same as a.insert(bisect.bisect_left(a,x, lo, hi), x).
bisect.insort_right(a,x, lo=0, hi=len(a))
bisect.insort(a, x,lo=0, hi=len(a)) :
Similar to insort_left, but if x already exists, insert it to the right.
The functions provided by the Bisect module can be divided into two categories: bisect* is only used to find the index and does not perform actual insertion; while insort* is used for actual insertion.
A typical application of this module is to calculate the score level:
def grade(score,breakpoints=[60, 70, 80, 90], grades='FDCBA'): i = bisect.bisect(breakpoints, score) return grades[i] print [grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]
Execution result:
['F', 'A', 'C', 'C', 'B', 'A', 'A']
Similarly, we can Use the bisect module to implement binary search:
def binary_search_bisect(lst, x): from bisect import bisect_left i = bisect_left(lst, x) if i != len(lst) and lst[i] == x: return i return None
Let’s test its performance with binary search implemented with recursion and loops:
Recursion: 4.00940990448 Loop: 2.6583480835 Bisect: 1.74922895432
Yes We see that it is slightly faster than the loop implementation and almost half as fast as the recursive implementation.
Python's famous data processing library numpy also has a function numpy.searchsorted for binary search. Its usage is basically the same as bisect, except that if you want to insert on the right, you need to set the parameter side='right', for example:
>>> import numpy as np >>> from bisect import bisect_left, bisect_right >>> data = [2, 4, 7, 9] >>> bisect_left(data, 4) 1 >>> np.searchsorted(data, 4) 1 >>> bisect_right(data, 4) 2 >>> np.searchsorted(data, 4, side='right') 2
Then, let’s compare the performance:
In [20]: %timeit -n 100 bisect_left(data, 99999) 100 loops, best of 3: 670 ns per loop In [21]: %timeit -n 100 np.searchsorted(data, 99999) 100 loops, best of 3: 56.9 ms per loop In [22]: %timeit -n 100 bisect_left(data, 8888) 100 loops, best of 3: 961 ns per loop In [23]: %timeit -n 100 np.searchsorted(data, 8888) 100 loops, best of 3: 57.6 ms per loop In [24]: %timeit -n 100 bisect_left(data, 777777) 100 loops, best of 3: 670 ns per loop In [25]: %timeit -n 100 np.searchsorted(data, 777777) 100 loops, best of 3: 58.4 ms per loop
It can be found that the efficiency of numpy.searchsorted is very low, and it is not the same as bisect at all On the order of magnitude. Therefore searchsorted is not suitable for searching ordinary arrays, but it is quite fast for searching numpy.ndarray:
In [30]: data_ndarray = np.arange(0, 1000000) In [31]: %timeit np.searchsorted(data_ndarray, 99999) The slowest run took 16.04 times longer than the fastest. This could mean that an intermediate result is being cached. 1000000 loops, best of 3: 996 ns per loop In [32]: %timeit np.searchsorted(data_ndarray, 8888) The slowest run took 18.22 times longer than the fastest. This could mean that an intermediate result is being cached. 1000000 loops, best of 3: 994 ns per loop In [33]: %timeit np.searchsorted(data_ndarray, 777777) The slowest run took 31.32 times longer than the fastest. This could mean that an intermediate result is being cached. 1000000 loops, best of 3: 990 ns per loop
numpy.searchsorted can search multiple values at the same time:
>>> np.searchsorted([1,2,3,4,5], 3) 2 >>> np.searchsorted([1,2,3,4,5], 3, side='right') 3 >>> np.searchsorted([1,2,3,4,5], [-10, 10, 2, 3]) array([0, 5, 1, 2])
Summary
The above is the entire content of this article. I hope the content of this article can be helpful to everyone in learning or using python. If you have any questions, you can leave a message to communicate. .
For more related articles on binary search and bisect module implementation in Python, please pay attention to the PHP Chinese website!