Home Database Mysql Tutorial True Alphanumeric / natural sorting in MySQL - why is the answer always recursion?

True Alphanumeric / natural sorting in MySQL - why is the answer always recursion?

Nov 23, 2024 pm 12:55 PM

True Alphanumeric / natural sorting in MySQL - why is the answer always recursion?

Yesterday I attempted to solve alphanumeric sorting in MySQL and failed. (read that article here)

I did get close though and had the right concept, just wrong execution.

Today, I woke up and had an epiphany...recursion.

The problem with recursion is that you have to understand recursion to be able to do recursion...and I don't understand recursion enough to do recursion in MySQL.

However with a bit of Chat Gippity back and forth (by which I mean getting it to write what I asked for, getting back about 25% of what I asked for, fixing it and feeding it into a new chat so it doesn't keep repeating itself for about 2 hours) I got a working answer!

To the point

May I present to you my swan song, my masterpiece, the answer to life itself (ok fine, the only working solution to true alphanumeric sorting in MySQL I have seen).

WITH RECURSIVE process_numbers AS (
    SELECT 
        data_value,
        data_value AS remaining_data,
        CAST('' AS CHAR(20000)) AS processed_data,
        1 AS iteration
    FROM test_data

    UNION ALL

    SELECT
        data_value,
        CASE 
            WHEN LOCATE(REGEXP_SUBSTR(remaining_data, '[0-9]+'), remaining_data) > 0 THEN
                SUBSTRING(
                    remaining_data,
                    LOCATE(REGEXP_SUBSTR(remaining_data, '[0-9]+'), remaining_data)
                    + LENGTH(REGEXP_SUBSTR(remaining_data, '[0-9]+'))
                )
            ELSE '' 
        END AS remaining_data,

        CONCAT(
            processed_data,
            CASE 
                WHEN LOCATE(REGEXP_SUBSTR(remaining_data, '[0-9]+'), remaining_data) > 0 THEN
                    LEFT(remaining_data, LOCATE(REGEXP_SUBSTR(remaining_data, '[0-9]+'), remaining_data) - 1)
                ELSE remaining_data
            END,
            CASE
                WHEN REGEXP_SUBSTR(remaining_data, '[0-9]+') IS NOT NULL THEN
                    RIGHT(CONCAT('0000000000', REGEXP_SUBSTR(remaining_data, '[0-9]+')), 10)
                ELSE ''
            END
        ) AS processed_data,

        iteration + 1
    FROM process_numbers
    WHERE LENGTH(remaining_data) > 0
          AND iteration < 100
)


SELECT 
    data_value,
    CONCAT(processed_data, remaining_data) AS sort_key
FROM process_numbers
WHERE remaining_data = ""
ORDER BY sort_key;
Copy after login

And if you want to try it out (and try to break it) you can play with this DB fiddle

So how does this work?

It does what I originally wanted to do, take every group of numbers and pads them out to 10 digits total.

So obviously if you feed this a couple of strings with 11 consecutive numeric digits it won't work without adjustment, but other than that it works fine!

You see, MySQL can sort numbers correctly, even in lexicographical ordering mode, but it has one flaw.

It counts "11" as smaller than "2" because of the fact it does sorting one character at a time (effectively). So "2" is bigger than "1" so it comes first. Then it checks the next character, by which point the sorting is incorrect (for numbers at least).

To understand this better, imagine if 1 was actually the letter "b" and 2 was the letter "c".

That is how MySQL "sees" numbers, they are just another character.

So if I had "bb" and "c" you would expect "bb" to come before "c". Now swap the numbers back in and you can see why "11" comes before "2".

So this is a hack?

Yes, we remove the issue by moving the numbers "back" through padding.

Going back to our example, if we padded "11" and "2" to 3 in length and use "a" as 0, this is what happens:

011 = abb
002 = aac 
Copy after login

notice how now the sorting would go:

  • character 1: is "a" bigger than "a" - no, they are the same.
  • character 2: is "b" bigger than "a" - yes, put the "a" before the "b"
  • character 3: is now irrelevant and we have already found an occurence earlier that was different and larger.

So by that logic we now have:

002 = aac (the second "a" comes before the second "b" in the next row)
011 = abb
Copy after login

And that is how it works!

Are you going to explain the recursion thing?

Kind of. I have been "round the houses" with this one and my knowledge is surface level, but I will give it a go.

The problem came with how RegEx works in MySQL. REGEX_SUBSTR will only ever find one match and then keep returning that for every other match it finds. So that was why my solution from yesterday was not working correctly.

But REGEX_REPLACE has its own issues where it doesn't seem to correctly expose the string length of a match (so we can't LPAD with it correctly)

That is why I thought about recursion as the answer.

I can use REGEX_SUBSTR to get the correct padding behaviour, and as each loop through the RegEx is essentially a new function call it doesn't "remember" the previous match, so it solves that problem.

And if you want a brief step through of the logic it is actually not as scary as it looks!

  • We loop over a given string, looking for any numbers (the entire number, not just a single character).
  • We then remove that from remaining_data so we don't match it again.
  • We take that number we just matched and pad it out to be 10 digits long total.
  • We then search for the next numeric part in the string and repeat the process, building up processed_data as our final string.
  • finally once we have no more numbers to process, we add any remaining letters to the end of processed_data to complete the transformation and we return this as sort_key.

Then we can use this sort_key in our query to order the column correctly.

And the iteration part is purely a safeguarding tool, to make sure it doesn't completely run the MySQL server out of memory or crash the query if a sufficiently complex string is processed (or there is an error in the logic that means it would recurse forever).

That's a wrap!

Isn't it funny how sleeping on things brings new perspective?

Perhaps I should try out polyphasic sleep so I can sleep on problems 2-3 times more often each day and become a 10x developer? haha.

Anyway there you have it, a reasonably robust true alphanumeric sort.

Oh and in reality you should probably convert the sort_key to a stored column on your database using GENERATE or a stored procedure. Sadly the playground I use doesn't seem to support that and it is a Sunday so I will leave that to you, dear viewer!

Have a great rest of your weekend and a great week.

The above is the detailed content of True Alphanumeric / natural sorting in MySQL - why is the answer always recursion?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

When might a full table scan be faster than using an index in MySQL? When might a full table scan be faster than using an index in MySQL? Apr 09, 2025 am 12:05 AM

Full table scanning may be faster in MySQL than using indexes. Specific cases include: 1) the data volume is small; 2) when the query returns a large amount of data; 3) when the index column is not highly selective; 4) when the complex query. By analyzing query plans, optimizing indexes, avoiding over-index and regularly maintaining tables, you can make the best choices in practical applications.

Can I install mysql on Windows 7 Can I install mysql on Windows 7 Apr 08, 2025 pm 03:21 PM

Yes, MySQL can be installed on Windows 7, and although Microsoft has stopped supporting Windows 7, MySQL is still compatible with it. However, the following points should be noted during the installation process: Download the MySQL installer for Windows. Select the appropriate version of MySQL (community or enterprise). Select the appropriate installation directory and character set during the installation process. Set the root user password and keep it properly. Connect to the database for testing. Note the compatibility and security issues on Windows 7, and it is recommended to upgrade to a supported operating system.

Explain InnoDB Full-Text Search capabilities. Explain InnoDB Full-Text Search capabilities. Apr 02, 2025 pm 06:09 PM

InnoDB's full-text search capabilities are very powerful, which can significantly improve database query efficiency and ability to process large amounts of text data. 1) InnoDB implements full-text search through inverted indexing, supporting basic and advanced search queries. 2) Use MATCH and AGAINST keywords to search, support Boolean mode and phrase search. 3) Optimization methods include using word segmentation technology, periodic rebuilding of indexes and adjusting cache size to improve performance and accuracy.

Difference between clustered index and non-clustered index (secondary index) in InnoDB. Difference between clustered index and non-clustered index (secondary index) in InnoDB. Apr 02, 2025 pm 06:25 PM

The difference between clustered index and non-clustered index is: 1. Clustered index stores data rows in the index structure, which is suitable for querying by primary key and range. 2. The non-clustered index stores index key values ​​and pointers to data rows, and is suitable for non-primary key column queries.

MySQL: Simple Concepts for Easy Learning MySQL: Simple Concepts for Easy Learning Apr 10, 2025 am 09:29 AM

MySQL is an open source relational database management system. 1) Create database and tables: Use the CREATEDATABASE and CREATETABLE commands. 2) Basic operations: INSERT, UPDATE, DELETE and SELECT. 3) Advanced operations: JOIN, subquery and transaction processing. 4) Debugging skills: Check syntax, data type and permissions. 5) Optimization suggestions: Use indexes, avoid SELECT* and use transactions.

The relationship between mysql user and database The relationship between mysql user and database Apr 08, 2025 pm 07:15 PM

In MySQL database, the relationship between the user and the database is defined by permissions and tables. The user has a username and password to access the database. Permissions are granted through the GRANT command, while the table is created by the CREATE TABLE command. To establish a relationship between a user and a database, you need to create a database, create a user, and then grant permissions.

Explain different types of MySQL indexes (B-Tree, Hash, Full-text, Spatial). Explain different types of MySQL indexes (B-Tree, Hash, Full-text, Spatial). Apr 02, 2025 pm 07:05 PM

MySQL supports four index types: B-Tree, Hash, Full-text, and Spatial. 1.B-Tree index is suitable for equal value search, range query and sorting. 2. Hash index is suitable for equal value searches, but does not support range query and sorting. 3. Full-text index is used for full-text search and is suitable for processing large amounts of text data. 4. Spatial index is used for geospatial data query and is suitable for GIS applications.

Can mysql and mariadb coexist Can mysql and mariadb coexist Apr 08, 2025 pm 02:27 PM

MySQL and MariaDB can coexist, but need to be configured with caution. The key is to allocate different port numbers and data directories to each database, and adjust parameters such as memory allocation and cache size. Connection pooling, application configuration, and version differences also need to be considered and need to be carefully tested and planned to avoid pitfalls. Running two databases simultaneously can cause performance problems in situations where resources are limited.

See all articles