question
There is currently a system that reads unique identifiers (id
) and their attached data sets from a SQL table into an unordered map. These datasets used to start with id
1, but adding and removing datasets took about 10 milliseconds. Note that not all datasets are always loaded into RAM.
When the program starts, it reads SELECT MAX(id)
from the database and proceeds to add 1 to the counter variable which will be used as the id# for any added dataset ##. The
ids of the deleted dataset are no longer used anywhere. This will inevitably lead to gaps in the
id sequence, which will one day overflow.
id values. Also consistent with only partially loaded SQL tables and program memory. The data in memory may take several minutes or immediately to be saved to the SQL table.
idea
One solution I came up with was to do an expensive query on each created dataset at runtime to find the smallest gap in the SQL table, checking if this id exists as a value in the unordered map , and then again use the variable from the counter as a backup to avoid endless queries for the free
id. This works perfectly for quantities of 1
id. Then it remains free in SQL, but is fetched in the unordered map until memory is saved again.
ids for new datasets until the vector is empty and then (or often) doing new queries for more IDs . But I can't think of a query to find X number of gaps in a table that may or may not have an
id column starting with 1.
mo and
mi .
userdata that contains the
id and
dataset columns, both 32-bit signed INTs. How to find a series of gaps in the
id column? For example, when the
ids in the table are 1,6,7,9, I want the query to return 2,3,4,5,8.
If there is a database change every 10 milliseconds, then there are 100 changes per second. A signed
int
can hold approximately 2,147,483,648 values, or 21,474,846 seconds, which is approximately 8 months. After this, it is not possible to have a new ID available.The first solution is to use the
64bit
type instead ofint
. This gives you about 13,600 years (for signed 64b), which seems enough :)Other solution is to have a vector containing all possible IDs. Vector storage
bool
(ID used/unused). Requesting a new ID is done by moving the vector to the first position marked as unused.This vector uses a lot of RAM, although there is a version of std::vector specifically for bool that requires less RAM.
The third solution is to deal with storing a linked list (possibly doubly linked) of deleted (read: reusable) IDs.
When a new ID is requested, the list provides its header, or the size of the table if the list is empty.
When a dataset is deleted, its ID is correctly inserted into the list, so the list is always sorted.
When an ID is reused, it is removed from the list.
Deleting the last record in the table may also delete the last nodes in the list because they are useless (case ID > table size). That's why I recommend using a doubly linked list so that the last node can be removed quickly.
So the list uses "new" and "delete" on its nodes quickly, and also runs up and down (for dual links) frequently to insert new nodes.
This is a bit slow, but I hope the list isn't too big and then the time required isn't bad.
Also note that this list gives you the array of gaps you need.