Efficiently find the last timestamp within a given time range
P粉232409069
P粉232409069 2024-03-22 13:22:08
0
1
709

The problem is with a hypothetical online store; I will simplify the data for my specific problem.

Suppose there are two tables: one holds the registration date of each customer

CREATE TABLE customers (
    customerID int,
    register DATE,
);

, the second one saves all the customer’s online visits

CREATE TABLE visits (
    customerID int,
    visit DATE,
);

For each user, I want to find the last day the user visited the online store within 7 days of registration, in the most efficient way. If the user has not visited the store since registration, I should return NULL as the last visit date within 7 days of registration.

Of course I can LEFT JOIN customers access by Customer ID and filter the access date to between the registration day and Add seven, and finally get the maximum date of each customer. However, the initial join results in a huge table and I want to avoid this computationally expensive operation.

Thank you very much for your thoughts!

P粉232409069
P粉232409069

reply all(1)
P粉514458863

This is probably the most efficient:

SELECT  c.customerID, c.register,
        ( SELECT MAX(v.visit)
            FROM visits AS v
            WHERE v.customerID = c.customerID
              AND v.visit < c.register + INTERVAL 7 DAY
        ) AS last_date
    FROM customers AS c;

index:

customers:  PRIMARY KEY(customerId)  -- is this what  you have?
visits:     INDEX(customerID, visit)  -- or perhaps PRIMARY KEY

Your concerns about "computationally expensive operations":

  • The cost of retrieving rows is higher than the cost of evaluating expressions.
  • My recommendation for accessing the index is to "overwrite" it, so it only looks at the index.
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template