Faster Array Item Ranking in Python/NumPy without Double Sorting
In data analysis and machine learning, it's often necessary to rank items in an array based on their values. Double sorting the array for ranking can be time-consuming.
Initial Approach
A common method is exemplified below:
<code class="python">import numpy as np # Sample array array = np.array([4, 2, 7, 1]) # Temporary array for sorting temp = array.argsort() # Rank computation ranks = np.arange(len(array))[temp.argsort()]</code>
This approach involves two sorting operations, increasing its complexity.
Optimized Solution
To avoid double sorting, we can leverage NumPy's argsort() function twice:
<code class="python">array = np.array([4, 2, 7, 1]) # First argsort to obtain element order order = array.argsort() # Second argsort to obtain ranking ranks = order.argsort()</code>
Explanation
First, we use argsort() to obtain the order of array elements. Then, we apply argsort() again on the resulting order, which gives us the ranking.
Conclusion
This optimized technique significantly improves the speed of array item ranking by avoiding unnecessary sorting. For high-dimensional arrays, use the axis argument in argsort to specify the axis for sorting.
The above is the detailed content of How to Rank Array Items in Python/NumPy Efficiently Without Double Sorting?. For more information, please follow other related articles on the PHP Chinese website!