I have a large table with millions of records in MySQL incident_archive
and I want to sort the rows by the created
column and keep the first X rows and delete the rest OK, what's the most efficient way.
So far I've come up with this solution in Python:
def do_delete_archive(rowsToKeep): if rowsToKeep > 0: db_name = find_environment_value('DB_NAME', False, "dbname") db_host = find_environment_value('DB_HOST', False, "host") db_user = find_environment_value('DB_USER', False, "username") db_pass = find_environment_value('DB_PASS', False, "password") db = MySQLdb.connect(host=db_host,user=db_user,passwd=db_pass,db=db_name) cursor = db.cursor() sql = f"""DELETE FROM `incident_archive` WHERE incident_id NOT IN ( SELECT incident_id FROM ( SELECT incident_id FROM `incident_archive` ORDER BY created DESC LIMIT {rowsToKeep}) foo) LIMIT 10000;""" try: rowcount = rowsToKeep+ 1 while rowcount > rowsToKeep: cursor.execute(sql) db.commit() rowcount = cursor.rowcount print(f"--- Affected Rows: {rowcount} ---") except: db.rollback()
The problem I have here is that this method will not work if the value of rowsToKeep
is greater than or equal to 10000
, what is a better way of doing this process?
**Note: The rowsToKeep value is dynamic, which means it can change.
I came up with the following solution:
Note: Threshold is the variable containing the maximum number of records we wish to keep at 1000 in our example