There are 29,938,766 rows in my VISITS table, which looks like this
USER_ID (INT) | VISITED_IN(DATETIME) |
---|---|
65 | 2020-08-26 07:57:43 |
1182 | 2019-03-15 02:46:48 |
1564 | 2015-07-04 10:59:44 |
73 | 2021-03-18 00:25:08 |
3791 | 2017-10-17 12:22:45 |
51 | 2022-05-02 19:11:09 |
917 | 2017-11-20 15:32:06 |
3 | 2019-12-29 15:15:51 |
51 | 2015-02-08 17:48:30 |
1531 | 2020-08-05 08:44:55 |
etc... | etc... |
When running this query, it takes 17-20 seconds and returns 63,514 (user has 63,514 visits)
SELECT COUNT(*) FROM VISITS WHERE USER_ID = 917
When running this query, it takes 17-20 seconds and returns 193 (user has 193 visits)
SELECT COUNT(*) FROM VISITS WHERE USER_ID = 716
The problem is that querying 29,938,766 rows always takes 17-20 seconds, even if the user only has 3, 50, 70 or 1,000,000 visits.
I think the problem is because it is looping through all rows?
The second query must be faster than the first query. It depends on the number of rows. But both queries take the same time!
Do you have any suggestions for me to avoid this problem?
Table Structure
Update: Here is a new suggested scenario:
When a user goes into his or someone else's profile, he can see the number of profile visits and can filter the visits this way
Last 24 hours | ---> SELECT COUNT(*) FROM VISITS WHERE USER_ID = 5 AND VISITED_IN >= DATE_SUB(NOW(), INTERVAL 1 DAY); Last 7 days | ---> SELECT COUNT(*) FROM VISITS WHERE USER_ID = 5 AND VISITED_IN >= DATE_SUB(NOW(), INTERVAL 7 DAY); Last 30 days | ---> SELECT COUNT(*) FROM VISITS WHERE USER_ID = 5 AND VISITED_IN >= DATE_SUB(NOW(), INTERVAL 30 DAY); All time | ---> SELECT VISITS FROM USERS WHERE USER_ID = 5;
Additionally, I will create a recurring event that will execute this command every day.
DELETE FROM VISITS WHERE VISITED_IN <= DATE_SUB(NOW(), INTERVAL 30 DAY);
Also, when adding a new row in the VISITS table, I will make sure to increment the VISITS column.
UPDATE USERS SET VISITS = VISITS + 1 WHERE ID = 5
will speed up all the
SELECTs
you mentioned. They will have to scan a large chunk of the index; they won't have to "scan the entire table".DELETE
Requires `INDEX(visited_in). But if you don't run it often enough, problems can arise. This is because deleting thousands of rows at once can be a problem. Consider running the delete operation at least once every hour.If the table is very large, etc., consider using "time series" partitioning. With
DROP PARTITION
, the speed is faster. PartitionAny caching service will provide a stale count, but sometimes it is faster.
"The database can be accessed every time someone opens the page", but only if the query is efficient enough. Do indexing.
In my answer to your other question, I explained how summary tables can speed things up even more. However, it assumes that the "last N days" are measured from midnight to midnight. Your current query is
NOW() - INTERVAL N DAY
. This is more confusing to implement than midnight. Would you like to change the meaning of "last N days"?(Some INDEX basics...)
An important reason for any index is its ability to quickly find rows based on certain columns.
INDEX
is a list of keys mapped to rows.UNIQUE INDEX
isINDEX
plus a uniqueness constraint - meaning no two rows in the index have the same value.PRIMARY KEY
is a unique index specified to uniquely identify each row in the table."key" and "index" are synonyms.
Indexes (in MySQL's InnoDB engine) are implemented as BTree (actually a B Tree; see Wikipedia). In the case of PK, the remaining columns sit there with the PK value. For "secondary" keys, the "value" part of the BTree is the PK column.
Any index can contain 1 or more columns (called "composite")
INDEX(lastname)
Unlikely to be uniqueINDEX(lastname,firstname)
is still unlikely to be unique, but it is "composite".