Python 이진 검색 및 이등분 모듈-파이썬 튜토리얼-php.cn

Python 목록의 내부 구현은 선형 목록인 배열입니다. 목록에서 요소를 찾으려면 O(n)의 시간 복잡도를 갖는 list.index() 메서드를 사용할 수 있습니다. 대용량 데이터의 경우 최적화를 위해 이진 검색을 사용할 수 있습니다. 이진 검색에서는 객체 순서를 지정해야 합니다. 기본 원칙은 다음과 같습니다.

1. 중간 요소가 검색할 요소인 경우 검색 프로세스가 종료됩니다. ;

2. 특정 요소가 중간 요소보다 크거나 작은 경우, 중간 요소보다 크거나 작은 배열의 절반을 검색하고 다음과 같이 중간 요소부터 비교를 시작합니다. 전에.

3. 특정 단계에서 배열이 비어 있으면 찾을 수 없다는 의미입니다.

이진 검색도 이진 검색이 됩니다. 알고리즘을 비교할 때마다 검색 범위가 절반으로 줄어들고 시간 복잡도는 O(logn)입니다.

우리는 재귀와 반복을 사용하여 각각 이진 검색을 구현합니다.

def binary_search_recursion(lst, value, low, high):  
    if high < low:  
        return None
    mid = (low + high) / 2  
    if lst[mid] > value:  
        return binary_search_recursion(lst, value, low, mid-1)  
    elif lst[mid] < value:  
        return binary_search_recursion(lst, value, mid+1, high)  
    else:  
        return mid  
 
def binary_search_loop(lst,value):  
    low, high = 0, len(lst)-1  
    while low <= high:  
        mid = (low + high) / 2  
        if lst[mid] < value:  
            low = mid + 1  
        elif lst[mid] > value:  
            high = mid - 1
        else:
            return mid  
    return None

로그인 후 복사

그런 다음 이 두 구현에 대한 성능 테스트를 수행합니다.

if __name__ == "__main__":
    import random
    lst = [random.randint(0, 10000) for _ in xrange(100000)]
    lst.sort()
 
    def test_recursion():
        binary_search_recursion(lst, 999, 0, len(lst)-1)
 
    def test_loop():
        binary_search_loop(lst, 999)
 
    import timeit
    t1 = timeit.Timer("test_recursion()", setup="from __main__ import test_recursion")
    t2 = timeit.Timer("test_loop()", setup="from __main__ import test_loop")
 
    print "Recursion:", t1.timeit()
    print "Loop:", t2.timeit()

로그인 후 복사

실행 결과는 다음과 같습니다.

Recursion: 3.12596702576
Loop: 2.08254289627

로그인 후 복사

재귀보다 루프 방식이 더 효율적이라는 것을 알 수 있습니다.

Python에는 순서가 지정된 목록을 유지하기 위한 bisect 모듈이 있습니다. bisect 모듈은 순서가 지정된 목록에 요소를 삽입하는 알고리즘을 구현합니다. 어떤 경우에는 목록을 반복적으로 정렬하거나 큰 목록을 구성한 후 정렬하는 것보다 이것이 더 효율적입니다. Bisect는 이등분을 의미합니다. 여기서는 정렬 방법을 사용하여 정렬된 목록의 적절한 위치에 요소를 삽입하므로 정렬된 목록을 유지하기 위해 매번 sort를 호출할 필요가 없습니다.

다음은 간단한 사용 예입니다.

import bisect
import random
 
random.seed(1)
 
print&#39;New  Pos Contents&#39;
print&#39;---  --- --------&#39;
 
l = []
for i in range(1, 15):
    r = random.randint(1, 100)
    position = bisect.bisect(l, r)
    bisect.insort(l, r)
    print&#39;%3d  %3d&#39; % (r, position), l

로그인 후 복사

출력 결과:

New  Pos Contents
---  --- --------
 14    0 [14]
 85    1 [14, 85]
 77    1 [14, 77, 85]
 26    1 [14, 26, 77, 85]
 50    2 [14, 26, 50, 77, 85]
 45    2 [14, 26, 45, 50, 77, 85]
 66    4 [14, 26, 45, 50, 66, 77, 85]
 79    6 [14, 26, 45, 50, 66, 77, 79, 85]
 10    0 [10, 14, 26, 45, 50, 66, 77, 79, 85]
  3    0 [3, 10, 14, 26, 45, 50, 66, 77, 79, 85]
 84    9 [3, 10, 14, 26, 45, 50, 66, 77, 79, 84, 85]
 44    4 [3, 10, 14, 26, 44, 45, 50, 66, 77, 79, 84, 85]
 77    9 [3, 10, 14, 26, 44, 45, 50, 66, 77, 77, 79, 84, 85]
  1    0 [1, 3, 10, 14, 26, 44, 45, 50, 66, 77, 77, 79, 84, 85]

로그인 후 복사

Bisect 모듈에서 제공하는 기능은 다음과 같습니다.

bisect.bisect_left(a,x, lo=0, hi=len(a)):

순서 있는 목록 a에서 x를 삽입하는 인덱스를 찾습니다. lo와 hi는 목록의 범위를 지정하는 데 사용되며 기본값은 전체 목록을 사용하는 것입니다. x가 이미 존재하는 경우 왼쪽에 삽입합니다. 반환 값은 인덱스입니다.

bisect.bisect_right(a,x, lo=0, hi=len(a))

bisect.bisect(a, x,lo=0, hi=len(a)) :

이 두 함수는 bisect_left와 유사하지만 x가 이미 존재하는 경우 오른쪽에 삽입합니다.

bisect.insort_left(a,x, lo=0, hi=len(a)):

순서 목록에 x를 삽입합니다. 효과는 a.insert(bisect.bisect_left(a,x, lo, hi), x)와 동일합니다.

bisect.insort_right(a,x, lo=0, hi=len(a))

bisect.insort(a, x,lo=0, hi=len(a)) :

insort_left와 유사하지만 x가 이미 존재하는 경우 오른쪽에 삽입합니다.

Bisect 모듈에서 제공하는 기능은 두 가지 범주로 나눌 수 있습니다. bisect*는 인덱스를 찾는 데만 사용되며 실제 삽입을 수행하지 않는 반면, insort*는 실제 삽입에 사용됩니다. 이 모듈의 일반적인 응용 프로그램은 점수 수준을 계산하는 것입니다:

def grade(score,breakpoints=[60, 70, 80, 90], grades=&#39;FDCBA&#39;):
    i = bisect.bisect(breakpoints, score)
    return grades[i]
 
print [grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]

로그인 후 복사

실행 결과:

[&#39;F&#39;, &#39;A&#39;, &#39;C&#39;, &#39;C&#39;, &#39;B&#39;, &#39;A&#39;, &#39;A&#39;]

로그인 후 복사

마찬가지로 bisect 모듈을 사용하여 이진 검색을 구현할 수 있습니다.

def binary_search_bisect(lst, x):
    from bisect import bisect_left
    i = bisect_left(lst, x)
    if i != len(lst) and lst[i] == x:
        return i
    return None

로그인 후 복사

재귀와 루프로 구현된 이진 검색으로 성능을 테스트해 보겠습니다.

Recursion: 4.00940990448
Loop: 2.6583480835
Bisect: 1.74922895432

로그인 후 복사

루프 구현보다 약간 빠르고 재귀 구현보다 거의 절반 빠른 것을 볼 수 있습니다. <… ='right', 예:

그런 다음 성능을 다시 비교해 보겠습니다.

>>> import numpy as np
>>> from bisect import bisect_left, bisect_right
>>> data = [2, 4, 7, 9]
>>> bisect_left(data, 4)
1
>>> np.searchsorted(data, 4)
1
>>> bisect_right(data, 4)
2
>>> np.searchsorted(data, 4, side=&#39;right&#39;)
2

로그인 후 복사

numpy.searchsorted의 효율성이 매우 낮다는 것을 알 수 있습니다. 이등분과 같은 크기의 순서로. 따라서 searchsorted는 일반 배열 검색에는 적합하지 않지만 numpy.ndarray 검색에는 상당히 빠릅니다:

In [20]: %timeit -n 100 bisect_left(data, 99999)
100 loops, best of 3: 670 ns per loop
 
In [21]: %timeit -n 100 np.searchsorted(data, 99999)
100 loops, best of 3: 56.9 ms per loop
 
In [22]: %timeit -n 100 bisect_left(data, 8888)
100 loops, best of 3: 961 ns per loop
 
In [23]: %timeit -n 100 np.searchsorted(data, 8888)
100 loops, best of 3: 57.6 ms per loop
 
In [24]: %timeit -n 100 bisect_left(data, 777777)
100 loops, best of 3: 670 ns per loop
 
In [25]: %timeit -n 100 np.searchsorted(data, 777777)
100 loops, best of 3: 58.4 ms per loop

로그인 후 복사

numpy.searchsorted는 동시에 여러 값을 검색할 수 있습니다:

In [30]: data_ndarray = np.arange(0, 1000000)
 
In [31]: %timeit np.searchsorted(data_ndarray, 99999)
The slowest run took 16.04 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 996 ns per loop
 
In [32]: %timeit np.searchsorted(data_ndarray, 8888)
The slowest run took 18.22 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 994 ns per loop
 
In [33]: %timeit np.searchsorted(data_ndarray, 777777)
The slowest run took 31.32 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 990 ns per loop

로그인 후 복사

>>> np.searchsorted([1,2,3,4,5], 3)
2
>>> np.searchsorted([1,2,3,4,5], 3, side=&#39;right&#39;)
3
>>> np.searchsorted([1,2,3,4,5], [-10, 10, 2, 3])
array([0, 5, 1, 2])

로그인 후 복사