Detailed explanation of binary search and bisect module implemented in Python

高洛峰
Release: 2017-01-14 15:38:22
Original
1572 people have browsed it

Preface

In fact, the internal implementation of Python's list (list) is an array, which is a linear list. To find elements in a list, you can use the list.index() method, whose time complexity is O(n). For large amounts of data, binary search can be used for optimization.

Binary search requires that objects must be ordered. The basic principles are as follows:

1. Start from the middle element of the array. If the middle element happens to be the element to be searched, the search process ends;

2. If a specific element is greater than or less than the middle element, search in the half of the array that is greater than or less than the middle element, and start the comparison from the middle element as before.

3. If the array is empty at a certain step, it means it cannot be found.

Binary search also becomes a binary search. Each comparison of the algorithm reduces the search range by half, and its time complexity is O(logn).

We use recursion and looping to implement binary search respectively:

def binary_search_recursion(lst, value, low, high):
 if high < low:
 return None
 mid = (low + high) / 2
 if lst[mid] > value:
 return binary_search_recursion(lst, value, low, mid-1)
 elif lst[mid] < value:
 return binary_search_recursion(lst, value, mid+1, high)
 else:
 return mid
 
def binary_search_loop(lst,value):
 low, high = 0, len(lst)-1
 while low <= high:
 mid = (low + high) / 2
 if lst[mid] < value:
 low = mid + 1
 elif lst[mid] > value:
 high = mid - 1
 else:
 return mid
 return None
Copy after login

Then perform a performance test on these two implementations:

if __name__ == "__main__":
 import random
 lst = [random.randint(0, 10000) for _ in xrange(100000)]
 lst.sort()
 
 def test_recursion():
 binary_search_recursion(lst, 999, 0, len(lst)-1)
 
 def test_loop():
 binary_search_loop(lst, 999)
 
 import timeit
 t1 = timeit.Timer("test_recursion()", setup="from __main__ import test_recursion")
 t2 = timeit.Timer("test_loop()", setup="from __main__ import test_loop")
 
 print "Recursion:", t1.timeit()
 print "Loop:", t2.timeit()
Copy after login

The execution results are as follows:

Recursion: 3.12596702576
Loop: 2.08254289627
Copy after login

It can be seen that the loop method is more efficient than recursion.

bisect module

Python has a bisect module for maintaining ordered lists. The bisect module implements an algorithm for inserting elements into an ordered list. In some cases, this is more efficient than sorting the list repeatedly or constructing a large list and then sorting it. Bisect means bisection. The bisection method is used here to sort. It will insert an element into the appropriate position of an ordered list, which eliminates the need to call sort every time to maintain the ordered list.

The following is a simple usage example:

import bisect
import random
 
random.seed(1)
 
print&#39;New Pos Contents&#39;
print&#39;--- --- --------&#39;
 
l = []
for i in range(1, 15):
 r = random.randint(1, 100)
 position = bisect.bisect(l, r)
 bisect.insort(l, r)
 print&#39;%3d %3d&#39; % (r, position), l
Copy after login

Output result:

New Pos Contents
--- --- --------
 14 0 [14]
 85 1 [14, 85]
 77 1 [14, 77, 85]
 26 1 [14, 26, 77, 85]
 50 2 [14, 26, 50, 77, 85]
 45 2 [14, 26, 45, 50, 77, 85]
 66 4 [14, 26, 45, 50, 66, 77, 85]
 79 6 [14, 26, 45, 50, 66, 77, 79, 85]
 10 0 [10, 14, 26, 45, 50, 66, 77, 79, 85]
 3 0 [3, 10, 14, 26, 45, 50, 66, 77, 79, 85]
 84 9 [3, 10, 14, 26, 45, 50, 66, 77, 79, 84, 85]
 44 4 [3, 10, 14, 26, 44, 45, 50, 66, 77, 79, 84, 85]
 77 9 [3, 10, 14, 26, 44, 45, 50, 66, 77, 77, 79, 84, 85]
 1 0 [1, 3, 10, 14, 26, 44, 45, 50, 66, 77, 77, 79, 84, 85]
Copy after login

The functions provided by the Bisect module are:

bisect.bisect_left(a,x, lo=0, hi=len(a)):

Find the index of inserting x in the ordered list a. lo and hi are used to specify the range of the list, and the default is to use the entire list. If x already exists, insert it to the left. The return value is index.

bisect.bisect_right(a,x, lo=0, hi=len(a))

bisect.bisect(a, x,lo=0, hi=len( a)):

These two functions are similar to bisect_left, but if x already exists, insert it to the right.

bisect.insort_left(a,x, lo=0, hi=len(a)):

Insert x into the ordered list a. The effect is the same as a.insert(bisect.bisect_left(a,x, lo, hi), x).

bisect.insort_right(a,x, lo=0, hi=len(a))

bisect.insort(a, x,lo=0, hi=len(a)) :

Similar to insort_left, but if x already exists, insert it to the right.

The functions provided by the Bisect module can be divided into two categories: bisect* is only used to find the index and does not perform actual insertion; while insort* is used for actual insertion.

A typical application of this module is to calculate the score level:

def grade(score,breakpoints=[60, 70, 80, 90], grades=&#39;FDCBA&#39;):
 i = bisect.bisect(breakpoints, score)
 return grades[i]
 
print [grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]
Copy after login

Execution result:

[&#39;F&#39;, &#39;A&#39;, &#39;C&#39;, &#39;C&#39;, &#39;B&#39;, &#39;A&#39;, &#39;A&#39;]
Copy after login

Similarly, we can Use the bisect module to implement binary search:

def binary_search_bisect(lst, x):
 from bisect import bisect_left
 i = bisect_left(lst, x)
 if i != len(lst) and lst[i] == x:
 return i
 return None
Copy after login

Let’s test its performance with binary search implemented with recursion and loops:

Recursion: 4.00940990448
Loop: 2.6583480835
Bisect: 1.74922895432
Copy after login

Yes We see that it is slightly faster than the loop implementation and almost half as fast as the recursive implementation.

Python's famous data processing library numpy also has a function numpy.searchsorted for binary search. Its usage is basically the same as bisect, except that if you want to insert on the right, you need to set the parameter side='right', for example:

>>> import numpy as np
>>> from bisect import bisect_left, bisect_right
>>> data = [2, 4, 7, 9]
>>> bisect_left(data, 4)
1
>>> np.searchsorted(data, 4)
1
>>> bisect_right(data, 4)
2
>>> np.searchsorted(data, 4, side=&#39;right&#39;)
2
Copy after login

Then, let’s compare the performance:

In [20]: %timeit -n 100 bisect_left(data, 99999)
100 loops, best of 3: 670 ns per loop
 
In [21]: %timeit -n 100 np.searchsorted(data, 99999)
100 loops, best of 3: 56.9 ms per loop
 
In [22]: %timeit -n 100 bisect_left(data, 8888)
100 loops, best of 3: 961 ns per loop
 
In [23]: %timeit -n 100 np.searchsorted(data, 8888)
100 loops, best of 3: 57.6 ms per loop
 
In [24]: %timeit -n 100 bisect_left(data, 777777)
100 loops, best of 3: 670 ns per loop
 
In [25]: %timeit -n 100 np.searchsorted(data, 777777)
100 loops, best of 3: 58.4 ms per loop
Copy after login

It can be found that the efficiency of numpy.searchsorted is very low, and it is not the same as bisect at all On the order of magnitude. Therefore searchsorted is not suitable for searching ordinary arrays, but it is quite fast for searching numpy.ndarray:

In [30]: data_ndarray = np.arange(0, 1000000)
 
In [31]: %timeit np.searchsorted(data_ndarray, 99999)
The slowest run took 16.04 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 996 ns per loop
 
In [32]: %timeit np.searchsorted(data_ndarray, 8888)
The slowest run took 18.22 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 994 ns per loop
 
In [33]: %timeit np.searchsorted(data_ndarray, 777777)
The slowest run took 31.32 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 990 ns per loop
Copy after login

numpy.searchsorted can search multiple values ​​at the same time:

>>> np.searchsorted([1,2,3,4,5], 3)
2
>>> np.searchsorted([1,2,3,4,5], 3, side=&#39;right&#39;)
3
>>> np.searchsorted([1,2,3,4,5], [-10, 10, 2, 3])
array([0, 5, 1, 2])
Copy after login

Summary

The above is the entire content of this article. I hope the content of this article can be helpful to everyone in learning or using python. If you have any questions, you can leave a message to communicate. .

For more related articles on binary search and bisect module implementation in Python, please pay attention to the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template