I have a MySQL table with region and latitude/longitude location columns. Each area has many locations, such as 20.000
. Is there a way to select just a few, like 100
, that would appear to be evenly distributed on the map?
Distribution does not have to be perfect, query speed is more important. If this is not possible using MySQL directly, a very fast algorithm can be used to select evenly distributed locations.
Thanks in advance.
EDIT: Answering some requests in the comments. The data doesn't have anything to work with, it's just the area and location coordinates, for example:
+-------+--------------+----------+-----------+------------+--------+--------+ | id | area | postcode | lat | lon | colour | size | +-------+--------------+----------+-----------+------------+--------+--------+ | 16895 | Athens | 10431 | 37.983917 | 23.7293599 | red | big | | 16995 | Athens | 11523 | 37.883917 | 23.8293599 | green | medium | | 16996 | Athens | 10432 | 37.783917 | 23.7293599 | yellow | small | | 17000 | Thessaloniki | 54453 | 40.783917 | 22.7293599 | green | small | +-------+--------------+----------+-----------+------------+--------+--------+
There are also some columns with characteristics, but these are only used for filtering.
I did try to get the nth
rows at the same time and it seemed to work, albeit a bit slow
SET @a = 0; select * from `locations` where (@a := @a + 1) % 200 = 0
Using random()
can also be used, but it is also a bit slow.
Edit 2: It turns out that adding zip codes to a form is easy. With this, grouping by zip code seems to give a pleasing result. The only problem is, there are very large areas, around 3000 different postcodes, and getting just 100 of them might end up with many of them showing up in one place, so that might require further processing in PHP.
Edit3, answer @RickJames' questions in the comments so they are in one place:
AUTO_INCRMENT
id and there can be gaps1000
, we want to display only a random 100
. If 1000
or less we can just show everything
This is a method that satisfies the goal.
As for "duplicates", think of this as a rough way to discover that two items fall into the same position:
"5" and "3" can be adjusted up (or down) to retain more (or fewer) ids. Because of the latitude/longitude arrangement, "5" and "3" are different; this ratio is probably true for most temperate regions. (Use the same amounts near the equator, use larger rations at higher latitudes.)
There is a small flaw...two very close projects may cross the boundary created by
ROUNDs
.How many rows does the original table have? How many rows did the above query generate? (
SELECT COUNT(*) FROM ( ... ) x;
)