Efficient Simple Random Sampling in MySQL
Many applications require the ability to extract a simple random sample from a large database table. However, using the seemingly intuitive method of SELECT * FROM table ORDER BY RAND() LIMIT 10000 can be prohibitively slow for tables with millions of rows.
Faster Solution
A more efficient approach is to use the rand() function to assign a random number to each row, then filter the table based on this number:
SELECT * FROM table WHERE rand() <= 0.3
How It Works
This method generates a random number between 0 and 1 for each row. If this number is less than or equal to 0.3 (30%), the row is selected for the sample.
Advantages
Improved Version
For even greater efficiency, consider sampling the rows to 2-5x your desired sample size and sorting them by the random number using an index, then trimming the results to the desired size:
SELECT COUNT(*) FROM table; -- Use this to determine rand_low and rand_high SELECT * FROM table WHERE frozen_rand BETWEEN %(rand_low)s AND %(rand_high)s ORDER BY RAND() LIMIT 1000
This method uses an index scan to reduce the size of the data before sorting, making it suitable for large tables.
The above is the detailed content of How to Efficiently Perform Simple Random Sampling in MySQL?. For more information, please follow other related articles on the PHP Chinese website!