Statement
Based on the Python2.7 language, this article gives various methods to determine whether the list is sorted, and compares and analyzes its performance on the author's Windows XP host (Pentium G630 2.7GHz main frequency 2GB memory).
1. Question raising
Haskell training teacher asked a question: How to determine whether the list has been sorted?
Sorting or not is actually just some binary relationship between adjacent elements, that is, a->a->Bool. So the first step is to find the list of tuples; the second step is to apply this function to each tuple, and then use the and operation. The implementation code given by the teacher is as follows:
pair lst = zip lst ( tail lst ) sorted lst predict = and [ predict x y | (x,y) <- pair lst]
In Haskell, the equal sign is preceded by the name and parameters of the function, followed by the function definition. The pair function displaces the list lst (tail removes the first element of the list), and then combines it with the original list to form a tuple list of adjacent elements under the action of zip. The predict function accepts two numbers and returns True or False depending on the size. and sums all elements in a list of type [Bool], and its semantics are similar to Python's all() function.
Then, the teacher asked everyone to think about performance issues. The author proposed that if the accuracy requirements are not high, three sets of random numbers can be generated and sorted as subscripts, and the corresponding three sets of elements of the original list can be extracted to form three new sublists ("sampling"). If it is judged that the three sublists follow the same sorting rules, the original list is considered to be sorted. When the list is large and has been sorted in the previous section, selecting an appropriate number of random numbers can significantly reduce the scale of the operation while ensuring a certain accuracy.
In addition, data sources should also be considered in practical applications. For example, the os.listdir() method of the Python language returns list entries in alphabetical order on Windows systems, but returns out-of-order entries on Linux systems. Therefore, if the system platform (os.platform) is judged to be win32, the entries must have been sorted.
In order to compare and verify the feasibility of the random sampling method, the author first collected existing methods for judging list sorting on the Internet, mainly referring to the answers to the "Pythonic way to check if a list is sorted or not" question on the Stack Overflow website, and Relevant code examples are given in the second section of this article. Note that the "sorting" described in this article does not require strict sorting, that is, adjacent elements are allowed to be equal. For example, [1,2,2,3] is regarded as ascending order, and [3,3,2,2] is regarded as descending order.
2. Code implementation
The function name format for determining list sorting in this section is IsListSorted_XXX(). For the sake of brevity, except for code snippets and their output, they are always referred to as _XXX().
2.1 guess
def IsListSorted_guess(lst): listLen = len(lst) if listLen <= 1: return True #由首个元素和末尾元素猜测可能的排序规则 if lst[0] == lst[-1]: #列表元素相同 for elem in lst: if elem != lst[0]: return False elif lst[0] < lst[-1]: #列表元素升序 for i, elem in enumerate(lst[1:]): if elem < lst[i]: return False else: #列表元素降序 for i, elem in enumerate(lst[1:]): if elem > lst[i]: return False return True
_guess() is the most general implementation and is almost language independent. It is worth noting that the possible collation of the given list is guessed within this function, so there is no need for the external caller to specify the collation.
2.2 sorted
def IsListSorted_sorted(lst): return sorted(lst) == lst or sorted(lst, reverse=True) == lst
_sorted() uses the Python built-in function sorted(). Since sorted() sorts an unsorted list, the _sorted() function is mainly suitable for sorted lists.
If you want to determine if the list is unsorted and then sort it, it is better to call the sort() method of the list directly, because this method will internally determine whether the list is sorted. For a sorted list, the time complexity of this method is linear order O(n) - judging is O(n) and sorting is O(nlgn).
2.3 for-loop
def IsListSorted_forloop(lst, key=lambda x, y: x <= y): for i, elem in enumerate(lst[1:]): #注意,enumerate默认迭代下标从0开始 if not key(lst[i], elem): #if elem > lst[i]更快,但通用性差 return False return True
Regardless of whether the list is sorted or not, the time complexity of this function is linear order O(n). Note that the key parameter indicates that the default sorting rule is ascending order.
2.4 all
def IsListSorted_allenumk(lst, key=lambda x, y: x <= y): return all(key(lst[i], elem) for i, elem in enumerate(lst[1:])) import operator def IsListSorted_allenumo(lst, oCmp=operator.le): return all(oCmp(lst[i], elem) for i, elem in enumerate(lst[1:])) def IsListSorted_allenumd(lst): return all((lst[i] <= elem) for i, elem in enumerate(lst[1:])) def IsListSorted_allxran(lst, key=lambda x,y: x <= y): return all(key(lst[i],lst[i+1]) for i in xrange(len(lst)-1)) def IsListSorted_allzip(lst, key=lambda x,y: x <= y): from itertools import izip #Python 3中zip返回生成器(generator),而izip被废弃 return all(key(a, b) for (a, b) in izip(lst[:-1],lst[1:]))
Lambda expressions and operator operators are equally fast. The former is simple and flexible, while the latter is slightly more efficient (actual measurement is not necessarily true). But neither is as fast as direct comparison of list elements (there may be calling overhead). That is, _allenumd() is faster than _allenumo() and _allenumk().
If you use lambda expressions to indicate sorting rules, you only need to change the comparison operator between x and y when changing the rules; if you use the operator module to indicate sorting rules, you need to change the object comparison method when changing the rules. Specifically, lt(x, y) is equivalent to x < y, le(x, y) is equivalent to x <= y, eq(x, y) is equivalent to x == y, ne(x, y) is equivalent to x != y, gt(x, y) is equivalent to x > y, ge(x, y) is equivalent to x >= y. For example, if the _allenumo() function wants strict ascending order, set oCmp=operator.lt.
In addition, according to the help information of the all() function, _allenumk() is actually the equivalent form of _forloop().
2.5 numpy
def IsListSorted_numpy(arr, key=lambda dif: dif >= 0): import numpy try: if arr.dtype.kind == 'u': #无符号整数数组执行np.diff时存在underflow风险 arr = numpy.int64(lst) except AttributeError: pass #无dtype属性,非数组 return (key(numpy.diff(arr))).all() #numpy.diff(x)返回相邻数组元素的差值构成的数组
NumPy is a basic Python package for scientific computing that can store and process large matrices. It contains a powerful N-dimensional array object that is much more efficient than Python's own nested list structure. The measured data in Section 3 shows that _numpy() performs very well when processing large lists.
在Windows系统中可通过pip install numpy命令安装NumPy包,不建议登录官网下载文件自行安装。
2.6 reduce
def IsListSorted_reduce(iterable, key=lambda x, y: x <= y): cmpFunc = lambda x, y: y if key(x, y) else float('inf') return reduce(cmpFunc, iterable, .0) < float('inf')
reduce实现是all实现的变体。累加器(accumulator)中仅存储最后一个检查的列表元素,或者Infinity(若任一元素小于前个元素值)。
前面2.1~2.5小节涉及下标操作的函数适用于列表等可迭代对象(Iterable)。对于通用迭代器(Iterator)对象,即可以作用于next()函数或方法的对象,可使用_reduce()及后面除_rand()外各小节的函数。迭代器的计算是惰性的,只有在需要返回下一个数据时才会计算,以避免不必要的计算。而且,迭代器方式无需像列表那样切片为两个迭代对象。
2.7 imap
def IsListSorted_itermap(iterable, key=lambda x, y: x <= y): from itertools import imap, tee a, b = tee(iterable) #为单个iterable创建两个独立的iterator next(b, None) return all(imap(key, a, b))
2.8 izip
def IsListSorted_iterzip(iterable, key=lambda x, y: x <= y): from itertools import tee, izip a, b = tee(iterable) next(b, None) return all(key(x, y) for x, y in izip(a, b)) def pairwise(iterable): from itertools import tee, izip a, b = tee(iterable) next(b, None) return izip(a, b) #"s -> (s0,s1), (s1,s2), (s2, s3), ..." def IsListSorted_iterzipf(iterable, key=lambda x, y: x <= y): return all(key(a, b) for a, b in pairwise(iterable))
第三节的实测数据表明,虽然存在外部函数调用,_iterzipf()却比_iterzip()略为高效。
2.9 fast
def IsListSorted_fastd(lst): it = iter(lst) try: prev = it.next() except StopIteration: return True for cur in it: if prev > cur: return False prev = cur return True def IsListSorted_fastk(lst, key=lambda x, y: x <= y): it = iter(lst) try: prev = it.next() except StopIteration: return True for cur in it: if not key(prev, cur): return False prev = cur return True
_fastd()和_fastk()是Stack Overflow网站回答里据称执行最快的。实测数据表明,在列表未排序时,它们的性能表现确实优异。
2.10 random
import random def IsListSorted_rand(lst, randNum=3, randLen=100): listLen = len(lst) if listLen <= 1: return True #由首个元素和末尾元素猜测可能的排序规则 if lst[0] < lst[-1]: #列表元素升序 key = lambda dif: dif >= 0 else: #列表元素降序或相等 key = lambda dif: dif <= 0 threshold, sortedFlag = 10000, True import numpy if listLen <= threshold or listLen <= randLen*2 or not randNum: return (key(numpy.diff(numpy.array(lst)))).all() from random import sample for i in range(randNum): sortedRandList = sorted(sample(xrange(listLen), randLen)) flag = (key(numpy.diff(numpy.array([lst[x] for x in sortedRandList])))).all() sortedFlag = sortedFlag and flag return sortedFlag
_rand()借助随机采样降低运算规模,并融入其他判断函数的优点。例如,猜测列表可能的排序规则,并在随机采样不适合时使用相对快速的判断方式,如NumPy。
通过line_profiler分析可知,第20行和第21行与randLen有关,但两者耗时接近。因此randLen应小于listLen的一半,以抵消sorted开销。除内部限制外,用户可以调节随机序列个数和长度,如定制单个但较长的序列。
注意,_rand()不适用于存在微量异常数据的长列表。因为这些数据很可能被随机采样遗漏,从而影响判断结果的准确性。
三. 性能分析
本节借助Python标准库random模块,生成各种随机列表,以对比和分析上节列表排序判断函数的性能。
可通过sample()、randint()等方法生成随机列表。例如:
>>>import random >>> random.sample(range(10), 10); random.sample(range(10), 5) [9, 1, 6, 3, 0, 8, 4, 2, 7, 5] [1, 2, 5, 6, 0] >>> rand = [random.randint(1,10) for i in range(10)]; rand [7, 3, 7, 5, 7, 2, 4, 4, 9, 8] >>> random.sample(rand, 5); random.sample(rand, 5) [4, 7, 7, 9, 8] [3, 9, 2, 5, 7] >>> randGen = (random.randint(1,10) for i in range(10)); randGen <generator object <genexpr> at 0x0192C878>
sample()方法从列表中随机选择数字,结合range()函数可生产不含重复元素的随机列表;而randint()方法结合列表解析生成的随机列表可能包含重复元素。Python文档规定,首次导入random模块时使用当前系统时间作为种子初始化随机数生成器。因此,本文并未显式地调用seed()方法设置种子。
为度量性能表现,定义如下计时装饰器:
from time import clock TimeList = [] def FuncTimer(repeats=1000): def decorator(func): def wrapper(*args, **kwargs): try: startTime = clock() for i in xrange(repeats): ret = func(*args, **kwargs) except Exception, e: print '%s() Error: %s!' %(func.__name__, e) ret = None finally: #当目标函数发生异常时,仍旧输出计时信息 endTime = clock() timeElasped = (endTime-startTime)*1000.0 msg = '%21s(): %s =>Time Elasped: %.3f msec, repeated %d time(s).' \ %(func.__name__, ret, timeElasped, repeats) global TimeList; TimeList.append([timeElasped, msg]) return ret return wrapper return decorator def ReportTimer(): global TimeList; TimeList.sort(key=lambda x:x[0]) for entry in TimeList: print entry[1] TimeList = [] #Flush
该装饰器允许对输出进行排序,以便更直观地观察性能。基于该装饰器,下文将分别测试不同的排序场景。注意,第二节各函数头部需添加FuncTimer()装饰。
3.1 列表前段乱序
测试代码如下:
def TestHeadUnorderedList(): TEST_NAME = 'HeadUnorderedList'; scale = int(1e5) List = random.sample(xrange(scale), scale) + range(scale) print 'Test 1: %s, list len: %d' %(TEST_NAME, len(List)) IsListSorted_guess(List) IsListSorted_sorted(List) IsListSorted_allenumk(List) IsListSorted_allenumo(List) IsListSorted_allenumd(List) IsListSorted_allxran(List) IsListSorted_allzip(List) IsListSorted_forloop(List) IsListSorted_itermap(List) IsListSorted_iterzipf(List) IsListSorted_iterzip(List) IsListSorted_reduce(List) IsListSorted_numpy(numpy.array(List)) #若不先转换为数组,则耗时骤增 IsListSorted_fastd(List) IsListSorted_fastk(List) ReportTimer()
运行输出如下:
Test 1: HeadUnorderedList, list len: 200 IsListSorted_fastd(): False =>Time Elasped: 0.757 msec, repeated 1000 time(s). IsListSorted_fastk(): False =>Time Elasped: 1.091 msec, repeated 1000 time(s). IsListSorted_forloop(): False =>Time Elasped: 2.080 msec, repeated 1000 time(s). IsListSorted_guess(): False =>Time Elasped: 2.123 msec, repeated 1000 time(s). IsListSorted_allxran(): False =>Time Elasped: 2.255 msec, repeated 1000 time(s). IsListSorted_allenumd(): False =>Time Elasped: 2.672 msec, repeated 1000 time(s). IsListSorted_allenumo(): False =>Time Elasped: 3.021 msec, repeated 1000 time(s). IsListSorted_allenumk(): False =>Time Elasped: 3.207 msec, repeated 1000 time(s). IsListSorted_itermap(): False =>Time Elasped: 5.845 msec, repeated 1000 time(s). IsListSorted_allzip(): False =>Time Elasped: 7.793 msec, repeated 1000 time(s). IsListSorted_iterzip(): False =>Time Elasped: 9.667 msec, repeated 1000 time(s). IsListSorted_iterzipf(): False =>Time Elasped: 9.969 msec, repeated 1000 time(s). IsListSorted_numpy(): False =>Time Elasped: 16.203 msec, repeated 1000 time(s). IsListSorted_sorted(): False =>Time Elasped: 45.742 msec, repeated 1000 time(s). IsListSorted_reduce(): False =>Time Elasped: 145.447 msec, repeated 1000 time(s). Test 1: HeadUnorderedList, list len: 200000 IsListSorted_fastd(): False =>Time Elasped: 0.717 msec, repeated 1000 time(s). IsListSorted_fastk(): False =>Time Elasped: 0.876 msec, repeated 1000 time(s). IsListSorted_allxran(): False =>Time Elasped: 2.104 msec, repeated 1000 time(s). IsListSorted_itermap(): False =>Time Elasped: 6.062 msec, repeated 1000 time(s). IsListSorted_iterzip(): False =>Time Elasped: 7.244 msec, repeated 1000 time(s). IsListSorted_iterzipf(): False =>Time Elasped: 8.491 msec, repeated 1000 time(s). IsListSorted_numpy(): False =>Time Elasped: 801.916 msec, repeated 1000 time(s). IsListSorted_forloop(): False =>Time Elasped: 2924.755 msec, repeated 1000 time(s). IsListSorted_guess(): False =>Time Elasped: 2991.756 msec, repeated 1000 time(s). IsListSorted_allenumo(): False =>Time Elasped: 3025.864 msec, repeated 1000 time(s). IsListSorted_allenumk(): False =>Time Elasped: 3062.792 msec, repeated 1000 time(s). IsListSorted_allenumd(): False =>Time Elasped: 3190.896 msec, repeated 1000 time(s). IsListSorted_allzip(): False =>Time Elasped: 6586.183 msec, repeated 1000 time(s). IsListSorted_sorted(): False =>Time Elasped: 119974.955 msec, repeated 1000 time(s). IsListSorted_reduce(): False =>Time Elasped: 154747.659 msec, repeated 1000 time(s).
可见,对于前段乱序的列表,无论其长短_fastd()和_fastk()的表现均为最佳。对于未排序列表,_sorted()需要进行排序,故性能非常差。然而,_reduce()性能最差。
实际上除_guess()和_sorted()外,其他函数都按升序检查列表。为安全起见,可仿照_guess()实现,先猜测排序方式,再进一步检查。
因为短列表耗时差异大多可以忽略,后续测试将不再包含短列表(但作者确实测试过),仅关注长列表。除非特别说明,列表长度为10万级,重复计时1000次。
3.2 列表中段乱序
测试代码及结果如下:
def TestMiddUnorderedList(): TEST_NAME = 'MiddUnorderedList'; scale = int(1e5) List = range(scale) + random.sample(xrange(scale), scale) + range(scale) print 'Test 2: %s, list len: %d' %(TEST_NAME, len(List)) IsListSorted_numpy(numpy.array(List)) # 1572.295 msec IsListSorted_guess(List) # 14540.637 msec IsListSorted_itermap(List) # 21013.096 msec IsListSorted_fastk(List) # 23840.582 msec IsListSorted_allxran(List) # 31014.215 msec IsListSorted_iterzip(List) # 33386.059 msec IsListSorted_forloop(List) # 34228.006 msec IsListSorted_allenumk(List) # 34416.802 msec IsListSorted_allzip(List) # 42370.528 msec IsListSorted_sorted(List) # 142592.756 msec IsListSorted_reduce(List) # 187514.967 msec ReportTimer()
为节省篇幅,已根据运行输出调整函数的调用顺序。下文也作如此处理。
3.3 列表后段乱序
测试代码及结果如下:
def TestTailUnorderedList(): TEST_NAME = 'TailUnorderedList'; scale = int(1e5) List = range(scale, 0, -1) + random.sample(xrange(scale), scale) print 'Test 3: %s, list len: %d' %(TEST_NAME, len(List)) IsListSorted_numpy(numpy.array(List), key=lambda dif: dif <= 0) # 980.789 msec IsListSorted_guess(List) # 13273.862 msec IsListSorted_itermap(List, key=lambda x, y: x >= y) # 21603.315 msec IsListSorted_fastk(List, key=lambda x, y: x >= y) # 24183.548 msec IsListSorted_allxran(List, key=lambda x, y: x >= y) # 32850.254 msec IsListSorted_forloop(List, key=lambda x, y: x >= y) # 33918.848 msec IsListSorted_iterzip(List, key=lambda x, y: x >= y) # 34505.809 msec IsListSorted_allenumk(List, key=lambda x, y: x >= y) # 35631.625 msec IsListSorted_allzip(List, key=lambda x, y: x >= y) # 40076.330 msec ReportTimer()
3.4 列表完全乱序
测试代码及结果如下:
def TestUnorderedList(): TEST_NAME = 'UnorderedList'; scale = int(1e5) List = random.sample(xrange(scale), scale) print 'Test 4: %s, list len: %d' %(TEST_NAME, len(List)) IsListSorted_fastk(List) # 0.856 msec IsListSorted_allxran(List) # 3.438 msec IsListSorted_iterzip(List) # 7.233 msec IsListSorted_itermap(List) # 7.595 msec IsListSorted_numpy(numpy.array(List)) # 207.222 msec IsListSorted_allenumk(List) # 953.423 msec IsListSorted_guess(List) # 1023.575 msec IsListSorted_forloop(List) # 1076.986 msec IsListSorted_allzip(List) # 2062.821 msec ReportTimer()
3.5 列表元素相同
测试代码及结果如下:
```python def TestSameElemList(): TEST_NAME = 'SameElemList'; scale = int(1e5) List = [5]*scale print 'Test 5: %s, list len: %d' %(TEST_NAME, len(List)) IsListSorted_numpy(numpy.array(List)) # 209.324 msec IsListSorted_sorted(List) # 2760.139 msec IsListSorted_guess(List) # 5843.942 msec IsListSorted_itermap(List) # 20609.704 msec IsListSorted_fastk(List) # 23035.760 msec IsListSorted_forloop(List) # 29043.206 msec IsListSorted_allenumk(List) # 29553.716 msec IsListSorted_allxran(List) # 30348.549 msec IsListSorted_iterzip(List) # 32806.217 msec ReportTimer()
3.6 列表升序
测试代码及结果如下:
def TestAscendingList(): TEST_NAME = 'AscendingList'; scale = int(1e5) List = range(scale) print 'Test 6: %s, list len: %d' %(TEST_NAME, len(List)) IsListSorted_numpy(numpy.array(List)) # 209.217 msec IsListSorted_sorted(List) # 2845.166 msec IsListSorted_fastd(List) # 5977.520 msec IsListSorted_guess(List) # 10408.204 msec IsListSorted_allenumd(List) # 15812.754 msec IsListSorted_itermap(List) # 21244.476 msec IsListSorted_fastk(List) # 23900.196 msec IsListSorted_allenumo(List) # 28607.724 msec IsListSorted_forloop(List) # 30075.538 msec IsListSorted_allenumk(List) # 30274.017 msec IsListSorted_allxran(List) # 31126.404 msec IsListSorted_reduce(List) # 32940.108 msec IsListSorted_iterzip(List) # 34188.312 msec IsListSorted_iterzipf(List) # 34425.097 msec IsListSorted_allzip(List) # 37967.447 msec ReportTimer()
可见,列表已排序时,_sorted()的性能较占优势。
3.7 列表降序
剔除不支持降序的函数,测试代码及结果如下:
def TestDescendingList(): TEST_NAME = 'DescendingList'; scale = int(1e2) List = range(scale, 0, -1) print 'Test 7: %s, list len: %d' %(TEST_NAME, len(List)) IsListSorted_numpy(numpy.array(List), key=lambda dif: dif <= 0) # 209.318 msec IsListSorted_sorted(List) # 5707.067 msec IsListSorted_guess(List) # 10549.928 msec IsListSorted_itermap(List, key=lambda x, y: x >= y) # 21529.547 msec IsListSorted_fastk(List, key=lambda x, y: x >= y) # 24264.465 msec import operator; IsListSorted_allenumo(List, oCmp=operator.ge) # 28093.035 msec IsListSorted_forloop(List, key=lambda x, y: x >= y) # 30745.943 msec IsListSorted_allenumk(List, key=lambda x, y: x >= y) # 32696.205 msec IsListSorted_allxran(List, key=lambda x, y: x >= y) # 33431.473 msec IsListSorted_allzip(List, key=lambda x, y: x >= y) # 34837.019 msec IsListSorted_iterzip(List, key=lambda x, y: x >= y) # 35237.475 msec IsListSorted_reduce(List, key=lambda x, y: x >= y) # 37035.270 msec ReportTimer()
3.8 迭代器测试
参数为列表的函数,需要先将列表���过iter()函数转换为迭代器。注意,当iterable参数为iterator时,只能计时一次,因为该次执行将用尽迭代器。
测试代码如下:
def TestIter(): TEST_NAME = 'Iter'; scale = int(1e7) List = range(scale) #升序 # List = random.sample(xrange(scale), scale) #乱序 print 'Test 8: %s, list len: %d' %(TEST_NAME, len(List)) iterL = iter(List); IsListSorted_guess(list(iterL)) iterL = iter(List); IsListSorted_sorted(iterL) iterL = iter(List); IsListSorted_itermap(iterL) iterL = iter(List); IsListSorted_iterzipf(iterL) iterL = iter(List); IsListSorted_iterzip(iterL) iterL = iter(List); IsListSorted_reduce(iterL) iterL = iter(List); IsListSorted_fastd(iterL) iterL = iter(List); IsListSorted_fastk(iterL, key=lambda x, y: x >= y) ReportTimer()
运行结果如下:
Test 8: Iter, list len: 10000000 ---升序 IsListSorted_fastd(): True =>Time Elasped: 611.028 msec, repeated 1 time(s). IsListSorted_sorted(): False =>Time Elasped: 721.751 msec, repeated 1 time(s). IsListSorted_guess(): True =>Time Elasped: 1142.065 msec, repeated 1 time(s). IsListSorted_itermap(): True =>Time Elasped: 2097.696 msec, repeated 1 time(s). IsListSorted_fastk(): True =>Time Elasped: 2337.233 msec, repeated 1 time(s). IsListSorted_reduce(): True =>Time Elasped: 3307.361 msec, repeated 1 time(s). IsListSorted_iterzipf(): True =>Time Elasped: 3354.034 msec, repeated 1 time(s). IsListSorted_iterzip(): True =>Time Elasped: 3442.520 msec, repeated 1 time(s). Test 8: Iter, list len: 10000000 ---乱序 IsListSorted_fastk(): False =>Time Elasped: 0.004 msec, repeated 1 time(s). IsListSorted_fastd(): False =>Time Elasped: 0.010 msec, repeated 1 time(s). IsListSorted_iterzip(): False =>Time Elasped: 0.015 msec, repeated 1 time(s). IsListSorted_itermap(): False =>Time Elasped: 0.055 msec, repeated 1 time(s). IsListSorted_iterzipf(): False =>Time Elasped: 0.062 msec, repeated 1 time(s). IsListSorted_guess(): False =>Time Elasped: 736.810 msec, repeated 1 time(s). IsListSorted_reduce(): False =>Time Elasped: 8919.611 msec, repeated 1 time(s). IsListSorted_sorted(): False =>Time Elasped: 12273.018 msec, repeated 1 time(s).
其中,_itermap()、_iterzip()、_iterzipf()、_reduce()、_fastd()、_fastk()可直接判断迭代器是否已排序。其他函数需将迭代器转换为列表后才能处理。_sorted()虽然接受迭代器参数,但sorted()返回列表,故无法正确判断迭代器顺序。
3.9 随机采样测试
综合以上测试,可知_fastk()和_numpy()性能较为突出,而且_rand()内置numpy方式。因此,以_fastk()和_numpy()为参照对象,测试代码如下(计时1次):
def TestRandList(): scale = int(1e6) List = random.sample(xrange(scale), scale) + range(scale) print 'Test 1: %s, list len: %d' %('HeadUnorderedList', len(List)) IsListSorted_fastk(List) IsListSorted_numpy(numpy.array(List)) IsListSorted_rand(List, randNum=1) ReportTimer() List = range(scale) + random.sample(xrange(scale), scale) + range(scale) print 'Test 2: %s, list len: %d' %('MiddUnorderedList', len(List)) IsListSorted_fastk(List) IsListSorted_numpy(numpy.array(List)) IsListSorted_rand(List, randNum=1) ReportTimer() List = range(scale, 0, -1) + random.sample(xrange(scale), scale) print 'Test 3: %s, list len: %d' %('TailUnorderedList', len(List)) IsListSorted_fastk(List, key=lambda x, y: x >= y) IsListSorted_numpy(numpy.array(List), key=lambda dif: dif <= 0) IsListSorted_rand(List, randNum=1) ReportTimer() List = [random.randint(1,scale) for i in xrange(scale)] #random.sample(xrange(scale), scale) print 'Test 4: %s, list len: %d' %('UnorderedList', len(List)) IsListSorted_fastk(List) IsListSorted_numpy(numpy.array(List)) IsListSorted_rand(List, randNum=1) ReportTimer() List = [5]*scale print 'Test 5: %s, list len: %d' %('SameElemList', len(List)) IsListSorted_fastk(List) IsListSorted_numpy(numpy.array(List)) IsListSorted_rand(List, randNum=1) ReportTimer() List = range(scale) print 'Test 6: %s, list len: %d' %('AscendingList', len(List)) IsListSorted_fastk(List) IsListSorted_numpy(numpy.array(List)) IsListSorted_rand(List, randNum=1) ReportTimer() List = range(scale, 0, -1) print 'Test 7: %s, list len: %d' %('DescendingList', len(List)) IsListSorted_fastk(List, key=lambda x, y: x >= y) IsListSorted_numpy(numpy.array(List), key=lambda dif: dif <= 0) IsListSorted_rand(List, randNum=1) ReportTimer() List = range(scale, 0, -1); List[scale/2]=0 print 'Test 8: %s, list len: %d' %('MiddleNotchList', len(List)) IsListSorted_fastk(List, key=lambda x, y: x >= y) IsListSorted_numpy(numpy.array(List), key=lambda dif: dif <= 0) IsListSorted_rand(List, randNum=1) IsListSorted_rand(List, randNum=1, randLen=scale/2) ReportTimer()
运行输出如下:
Test 1: HeadUnorderedList, list len: 2000000 IsListSorted_fastk(): False =>Time Elasped: 0.095 msec, repeated 1 time(s). IsListSorted_rand(): False =>Time Elasped: 0.347 msec, repeated 1 time(s). IsListSorted_numpy(): False =>Time Elasped: 7.893 msec, repeated 1 time(s). Test 2: MiddUnorderedList, list len: 3000000 IsListSorted_rand(): False =>Time Elasped: 0.427 msec, repeated 1 time(s). IsListSorted_numpy(): False =>Time Elasped: 11.868 msec, repeated 1 time(s). IsListSorted_fastk(): False =>Time Elasped: 210.842 msec, repeated 1 time(s). Test 3: TailUnorderedList, list len: 2000000 IsListSorted_rand(): False =>Time Elasped: 0.355 msec, repeated 1 time(s). IsListSorted_numpy(): False =>Time Elasped: 7.548 msec, repeated 1 time(s). IsListSorted_fastk(): False =>Time Elasped: 280.416 msec, repeated 1 time(s). Test 4: UnorderedList, list len: 1000000 IsListSorted_fastk(): False =>Time Elasped: 0.074 msec, repeated 1 time(s). IsListSorted_rand(): False =>Time Elasped: 0.388 msec, repeated 1 time(s). IsListSorted_numpy(): False =>Time Elasped: 3.757 msec, repeated 1 time(s). Test 5: SameElemList, list len: 1000000 IsListSorted_rand(): True =>Time Elasped: 0.304 msec, repeated 1 time(s). IsListSorted_numpy(): True =>Time Elasped: 3.955 msec, repeated 1 time(s). IsListSorted_fastk(): True =>Time Elasped: 210.977 msec, repeated 1 time(s). Test 6: AscendingList, list len: 1000000 IsListSorted_rand(): True =>Time Elasped: 0.299 msec, repeated 1 time(s). IsListSorted_numpy(): True =>Time Elasped: 4.822 msec, repeated 1 time(s). IsListSorted_fastk(): True =>Time Elasped: 214.171 msec, repeated 1 time(s). Test 7: DescendingList, list len: 1000000 IsListSorted_rand(): True =>Time Elasped: 0.336 msec, repeated 1 time(s). IsListSorted_numpy(): True =>Time Elasped: 3.867 msec, repeated 1 time(s). IsListSorted_fastk(): True =>Time Elasped: 279.322 msec, repeated 1 time(s). Test 8: MiddleNotchList, list len: 1000000 IsListSorted_rand(): True =>Time Elasped: 0.387 msec, repeated 1 time(s). IsListSorted_numpy(): False =>Time Elasped: 4.792 msec, repeated 1 time(s). IsListSorted_rand(): False =>Time Elasped: 78.903 msec, repeated 1 time(s). IsListSorted_fastk(): False =>Time Elasped: 110.444 msec, repeated 1 time(s).
可见,在绝大部分测试场景中,_rand()性能均为最佳,且不失正确率。注意测试8,当降序列表中间某个元素被置0(开槽)时,随机采样很容易遗漏该元素,导致误判。然而,这种场景在实际使用中非常罕见。