Natural Sort. MySQL. Sorting a 2D Array By Multiple Columns in PHP Naturally. Array_MultiSort with Natural Sort

MySQL has a biggest missing feature: It does not sort naturally. This has been an opened request since 2003, and MySQL team has no plan to fix this in a near future. I have looked for a solution to solve this problem for a while. Unfortunately, I found nothing useful. Here is a summary:

The + 0 Trick

Many developers suggest this solution:

SELECT names FROM your_table ORDER BY age + 0 ASC

This is not a good solution because it only works in some very specific situations. This query fails in most general cases.

Drupal Patch

Some developers suggest to use the patch from Drupal. Forget it, I could not even install it in my system.

My solution: Let PHP to handle the dirty work

So I decide to move the process to PHP. Before you say something about the performance issue, please keep reading my story first. I will talk about it in the later section.

MySQL is a relational database, it stores everything in a table. If we express that in an array, it will look something like this:

Array
(
    [1] => Array
        (
            [Name] => John
            [Identification] => 10
            [Address] => 100 Michigan Ave.
        )

    [2] => Array
        (
            [Name] => Peter
            [Identification] => 1000
            [Address] => 1 Michigan Ave.
        )

    [3] => Array
        (
            [Name] => Tom
            [Identification] => 2
            [Address] => 10 Michigan Ave.
        )

    [4] => Array
        (
            [Name] => Paul
            [Identification] => 1
            [Address] => 10 Michigan Ave.
        )

)

where the array key represents the record ID in the table.

Now, let say I want to sort the contents in the following orders:

  1. Sort by Identification: Ascending
  2. Then by Address: Descending
  3. Then by Name: Ascending

I specify the orders in an array:

$order = array('Identification' => 'ASC',
			   'Address' => 'DESC',
			   'Name' => 'ASC');

And include the following function in my code:

//array_multiSort with natural sort
function naturalSort2D(&$array){

   if (!function_exists(naturalSort2DCompare)){
	function naturalSort2DCompare($a, $b){
		global $order;

		foreach($order as $key => $value){
			
			if (!isset($a[$key])) continue;

			unset($compareResult);

                        //Case insensitive string comparisons using a "natural order" algorithm
			$compareResult = strnatcasecmp($a[$key], $b[$key]);
			
			if ($compareResult === 0) continue;
			
			$value = strtoupper(trim($value));
			
			if ($value === 'DESC'){
				$compareResult = $compareResult*-1;
			}
			
			return $compareResult;
		}
		
		return 0;
	}
   }
	
   //Maintain index association
   //Preserve array keys
   //Use usort if you don't care about key index
   uasort($array, 'naturalSort2DCompare');
	
   return true;
}

Now I sort the array:

naturalSort2D($array);

And I get my result like the following:

Array
(
    [4] => Array
        (
            [Name] => Paul
            [Identification] => 1
            [Address] => 10 Michigan Ave.
        )

    [3] => Array
        (
            [Name] => Tom
            [Identification] => 2
            [Address] => 10 Michigan Ave.
        )

    [1] => Array
        (
            [Name] => John
            [Identification] => 10
            [Address] => 100 Michigan Ave.
        )

    [2] => Array
        (
            [Name] => Peter
            [Identification] => 1000
            [Address] => 1 Michigan Ave.
        )

)

Performance

Before we talk about the performance issue, I like to talk about my database design first. In my application, the records are stored in a table. It is a multiple user system. Each user can access their own records only. Each user has about 3000 records, and we have 1000 users. Therefore we have about 3 millions of records in one table.

Here is a data structure:

#Information input by users
Name
Identification
Address

#Information input by system
ID
UserID

So the SQL quere is nothing more than something like:

Select * FROM `database`.`table` WHERE `UserID` = '1'

Previously, I let MySQL to handle the filter and sorting, i.e.,

Select * FROM `database`.`table` WHERE (`UserID` = '1') AND (`Address` LIKE `%Michigan%`) ORDER BY `Identification` ASC LIMIT 5, 10

After moving to PHP, I just let MySQL to handle the filter,

Select * FROM `database`.`table` WHERE (`UserID` = '1') AND (`Address` LIKE `%Michigan%`)

Notice that I will get all qualified records, could be few thousands. Then I let PHP to do the sorting:

$order = array('Identification' => 'ASC');
naturalSort2D($array);

Now I have a sorted array. Next I need to slice the records:

$array = array_slice($array, 5, 10);

Sounds like lots of work huh? Now let’s talk about the performance impact. Guess what, I see performance improvement for about 15%!

In the new approach, MySQL does lesser work. PHP does more work. Apparently, PHP works more effectively than MySQL. However, this may not true for all database model. Remember that in my situation, the system needs to handle at most few thousands records at a time? So I think PHP is doing better in this range. If your application needs to handle millions of records at a time, e.g., bookstore, then you better let MySQL to do the work.

By the way, the new approach allows to sort the dynamic generated content, which is a big plus in my situation. That is something I cannot do in MySQL approach.

Have fun.

–Derrick

Our sponsors:

Leave a Reply

Your email address will not be published. Required fields are marked *