I have a list of drivers, orders and dates for the period January 1, 2022 to January 15, 2022 (15 days) in a table named all_data
as shown below:
driver_id | order_id | Order Date |
---|---|---|
1 | one | 2022-01-01 |
1 | b | 2022-01-02 |
2 | c | 2022-01-01 |
2 | d | 2022-01-03 |
Within these 15 days, how do I find the number of continuously active drivers who have completed at least one order per day as of that date? The output should be a table like this:
Order Date | active_drivers |
---|---|
2022-01-01 | 30 |
2022-01-02 | 27 |
2022-01-03 | 25 |
For example, on January 1, 2022, 30 different drivers completed at least one order that day. On January 2, 2022, we have to find the number of unique drivers who fulfilled at least one order on January 1, 2022, and January 2, 2022. On January 3, 2022, we must count drivers who completed at least one order on January 1, 2022, January 2, 2022, and January 3, 2022.
What I tried
I found a similar solution in MySQL (below), but it is not allowed in bigquery because of the error "Unsupported subquery with table in join predicate".
MySQL SELECT order_date, (SELECT COUNT(distinct s1.driver_id) as num_hackers FROM all_data s2 join all_data s1 on s2. order_date = s1. order_date and (SELECT COUNT(distinct s3. order_date) FROM all_data s3 WHERE s3.driver_id = s2.driver_id AND s3. order_date < s1. order_date) = datediff(s1. order_date, date('2022-01-01'), day) )) from all_data
I've also read this Google BigQuery: Rolling Count Distinct question, but that's for a fixed 45 days, whereas the number of days here is a date-based variable. How to write a query in BigQuerySQL to find the rolling number of drivers that are continuously active per day?
First find all combinations of date and driver, then get the count of all drivers for each date. Try this:
Consider the following