I recently came across one of my old flights of fancy -- a one-line bubble sort.
Awhile back, in the midst of working on some sorting routines, I had the
brilliant (?) idea of seeing how far I could compress an implementation
of bubble sort in C. The resulting function, called 'lilbub' (little bubble sort),
contains a single line, only 65 characters long:
void
lilbub (int *a, int *b)
{
int *p=b,t;for(;b>(p-a?a:(p=b,++a));*p>t?(p[1]=*p,*p=t):0)t=*p--;
}

It kind of looks like a submission for the obfuscated C contest.
Does it work? Heck, yeah! Just pass pointers to the first and last elements
of the array to be sorted. That is, for an array `a[]` of size `n`,
do this: `lilbub(a, a+n-1);`. It works that way, not so much for efficiency, but
because I found I could reduce the code by a few characters.
In fact, it's a very inefficient sort routine, but it is small!
Anyone out there think they can reduce it even further?

**What kind of geek am I?**

The kind of folks who are attracted to programming and computer science seem to have
a knack for obsessing on certain 'ideal' problems. For example, how many academic papers
have been written on efficient methods for determining whether an integer is prime?
How many prime seive functions have been implemented in all the different languages?
Seems to be totally out of proportion to the importance of the problem.
But nerds love to play with primes.

And not just primes. How about searching for the perfect hash algorithm?
Computing a billion digits of pi? Trying to find a new and better sort algorithm?
We just can't help ourselves. I'm prey to all of the temptations listed above -- and others --
but I'm particularly prone to playing around with sorting routines. Hence my stupid sort trick above.

But it's not all in vain. Sorting techniques do have important real world uses.
I'm not about to attempt to cover the wide range of sort algorithms available
(that would be a four volume book, in itself), but will look at a couple of basic ones that you
ought to know.

**Selection sort**

Just about the simplest sort is selection sort.
It scans thru an array, looking for the smallest of all remaining elements, then moves it to the front.
Each pass places one item into its correct, final position.

Consider an array with indexes `a[0]..a[N]`.
The first pass searches the entire array for the smallest element. When found, it is swapped with `a[0]`.
It is where it belongs. The next pass searches `a[1]..a[N]` for the smallest element. When found,
it is swapped with `a[1]`. It is now where it belongs. And so on until every element is properly placed:

void
selsort (int *a, int num_elems)
{
// selection sort
int lo = 0;
int hi = num_elems - 1;
int tmp;
for (; lo < hi; ++lo)
{
int min = lo;
int i = lo+1;
for (; i <= hi; ++i)
if (a[i] < a[min])
min = i;
tmp = a[lo], a[lo] = a[min], a[min] = tmp;
}
}

This kind of simple sort routine is very effective for small arrays.
This is because the more complicated sort algorithms have more overhead.
When the number of items to sort is large, the overhead is well worth the effort.
But as the size decreases, it becomes a case of diminishing returns -- it's generally faster to spin
thru a simple algorithm for a small set of data.

**Quicksort**

The king of all sorting routines is quicksort. It's not the only fast sorting technique available, but,
it is as efficient as they get -- no other routine can improve upon it in the general case.
Sometimes, in specific instances, knowing certain information about the data to be sorted may allow
you to use a different algorithm that will be faster. But for the general case (especially with
more or less random data), it can't be beat.

Over the years, I've read many explanations of how quicksort works.
I've seen it described in various ways ranging from very straightforward to incredibly obfuscated.
In reality, quicksort is very easy to understand, altho a bit trickier to implement.
The basic algorithm is quite intuitive, but this can be hard to see by looking at various source
code implementations where the layers of optimization tricks shroud the basic simplicity of what
is going on.

Quicksort works somewhat like the selection sort above in that each pass goes about the business of
placing one array item in its final position. However it does not seek to find the minimum (or maximum)
value, but instead, grabs an element at random and determines its correct ending position.
This element, called the pivot, is used to arrange the remainder of the array: all items
smaller than the pivot are moved to the left end of the array, and all the larger items are moved to
the right. Consider this small array of numbers:

5 8 23 9 17 12 4 16 2 11 3 20

If we choose `12` as the pivot, then after the first pass of quicksort, we would arrive at the following
array:
2 8 3 9 11 5 4 [12] 16 17 23 20

The pivot item, `12`, is in its final, correct location. It divides the original array into two sub-arrays
that now need sorting. So, quicksort is called again for the left and right sub-arrays. This subdivision
continues recursively until the sub-arrays are so small that they cannot be subdivided any further.
One of the best implementations of quicksort to look at, if you are trying to learn about it, is the
following definition (which I have slightly modified) from the excellent book,
The Practice of Programming.

void
swap (int *a, int i, int j)
{
int temp;
temp = a[i], a[i] = a[j], a[j] = temp;
}
void
quicksort (int *a, int n)
{
int i;
int pivot;
int last = 0;
if (n <= 1)
return;
swap (a, 0, rand () % n);
pivot = a[0];
for (i = 1; i < n; ++i)
{
if (a[i] < pivot)
swap (a, ++last, i);
}
swap (a, 0, last);
quicksort (a, last);
quicksort (a+last+1, n-last-1);
}

This is not the most efficient implementation of quicksort, by any means, but it is simple to follow and makes the
basic algorithm fairly clear. The `rand()` call grabs a random element which is then placed
at the beginning of the array. The for-loop does the partitioning, which puts all the elements
smaller than the pivot on the left side. The final swap puts the pivot where it belongs.
The left and right sub-arrays can then be run thru quicksort themselves.

**Qsort**

By covering just selection sort and quicksort, I've omitted quite a few alternative sorting algorithms.
Ah yes, there are probably as many different sorting techniques out there as stars twinkling in the
night sky. You can find many online resources about these: NIST Data Structures and Algorithms
is a good one.

But my own experience leads me to the following conclusion: you either have a small number of items to sort, or you don't.
If the number of items is small, use selection sort (or some other similarly simple algorithm). Otherwise, use quicksort.
This is pretty simplistic, but, for most real world scenarios, it's really all you need. Study various other sorting techniques,
by all means -- it is interesting and educational. But for day-to-day business, the advice above will nearly always suffice.

Actually, this advice can be made even simpler for C programmers: just use qsort(). There is a library function called qsort() that is part
of the Standard C Library (declared in `stdlib.h`). It has been designed to handle sorting in a very general fashion.
This function will probably handle 99% of your sorting needs in normal programs.

Interestingly, despite the name, qsort() is not guaranteed by the C standard to be an implementation of quicksort.
A library writer can use any algorithm they choose. However, I have never actually encountered an implementation of qsort()
that did not use the quicksort algorithm -- it's mighty hard to beat.

The qsort() function has a generalized calling interface which allows you to sort objects of any type and size.
By passing in your own comparison function, and the size (in bytes) of the array elements, you can use qsort() to
sort just about any type of data. For example, the following sorts an ordinary integer array:

int
int_cmp (const void *p, const void *q)
{
int a = *(int *)p;
int b = *(int *)q;
if (a < b) return -1;
else if (a == b) return 0;
else return 1;
}
int
main ()
{
int a[] = {5, 8, 23, 9, 17, 12, 4, 16, 2, 11, 3, 20};
int n = sizeof(a) / sizeof(a[0]);
qsort (a, n, sizeof(a[0]), int_cmp);
//
// arg1: the array to be sorted
// arg2: the number of elements in the array
// arg3: the size (in bytes) of each array element
// arg4: a pointer to a comparison function
}

The comparison function must always look like the one above. That is, pointers to the two elements to be compared
are passed as arguments (declared as `void *` in the parameter list), and an int value returned.
The return value should be {-1, 0, 1} depending on whether the first element is {less than, equal to, or greater than}
the second element.
The general nature of qsort() is even more useful when sorting arrays whose elements are structures. For example:

typedef struct
{
char * name;
int age;
}
person_info;
int
name_cmp (const void *p, const void *q)
{
person_info *p1 = (person_info *)p;
person_info *p2 = (person_info *)q;
return strcmp (p1->name, p2->name);
}
int
age_cmp (const void *p, const void *q)
{
person_info *p1 = (person_info *)p;
person_info *p2 = (person_info *)q;
int a = p1->age;
int b = p2->age;
if (a < b) return -1;
else if (a == b) return 0;
else return 1;
}
int
main ()
{
person_info a[] =
{
{"fred", 22},
{"adam", 26},
{"bert", 19},
{"larry", 38},
{"zack", 16},
{"jimmy", 8}
};
int n = sizeof(a) / sizeof(a[0]);
char ansr[100];
printf ("sort by name or age (n/a)? ");
fgets (ansr, 100, stdin);
switch (ansr[0])
{
case 'n':
qsort (a, n, sizeof(a[0]), name_cmp);
break;
case 'a':
qsort (a, n, sizeof(a[0]), age_cmp);
break;
}
{
int i;
for (i = 0; i < n; ++i)
printf ("\n%12s, %2d", a[i].name, a[i].age);
}
printf ("\n\n");
return 0;
}

**A lean and mean integer quicksort**

Alright, I've just stipulated that 99% of the time, you can use qsort() to handle
all of your sorting needs. What about the other 1%? Maybe you need a super-duper efficient sorting
routine for a quite special case scenario. Or maybe you are just the maverick type who really wants
to implement his own quicksort routine (IOW, like me!)

Well, a couple of years ago (about the same time I was writing the silly 'lilbub' function),
I set about to write the fastest quicksort function that I could. Since the qsort() function
already handles general data types, I decided to concentrate on sorting integers -- a basic
and very common activity, and something that might truly benefit from a special purpose, fast-as-lightning implementation.

I'm pretty satisfied with what I came up with -- altho I won't guarantee that it couldn't be
improved upon (which always seems to be the case). I will say that, after testing it fairly
thoroughly, it appears to be faster than the library qsort() for the intended domain of sorting integer arrays.
It's a little unfair tho -- the library function has to call an external routine for every comparison,
while my comparisons are built-in. But that's the whole point to writing a special purpose sorting routine -- to
carefully tune it for a specific set of needs.

The source code is heavily commented so that it may help with understanding the basic workings of quicksort
and the partitioning loop. Here's my version:

void
int_qsort (int *lo, int *hi)
{
// an efficient, non-recursive quicksort for integer arrays
// the args should be pointers to the first and last elements
const int minimum_section_length = 12;
int *p;
int *q;
int pivot;
int tmp;
int num_elems;
uint rv = 1;
int *stack[2 * sizeof(int) * CHAR_BIT];
int **top = stack;
int **bot = stack;
while (lo < hi || top > bot)
{
// get valid [lo, hi] for this iteration
// -------------------------------------
if (lo >= hi)
hi = *top--, lo = *top--; // pop [lo, hi]
num_elems = hi-lo+1;
// run selection sort to finish off small sections
// ------------------------------------------------
if (num_elems <= minimum_section_length)
{
for (; lo < hi; ++lo)
{
// find minimum
q = lo;
for (p = lo+1; p <= hi; ++p)
if (*p < *q)
q = p;
// put minimum into low element
tmp = *lo, *lo = *q, *q = tmp;
}
// [lo, hi] is now sorted, go to next section
lo = hi+1;
continue;
}
// check for already sorted section
// --------------------------------
// find first unordered element
for (p = lo; p < hi; ++p)
if (*p < *(p+1))
break;
if (p == hi)
{
// [lo, hi] already sorted, go to next section
lo = hi+1;
continue;
}
// partition into two sections
// ---------------------------
// select a pivot
rv = ((rv ^ *lo) << 1) ^ *hi; // hash a random value
p = lo + (rv % num_elems); // and use as pivot index
pivot = *p;
// move pivot into low element
tmp = *lo, *lo = *p, *p = tmp;
// = = = = = = = = = = = = = = = = = = = = =
// loop entry:
//
// lo lo+1 hi
// [pivot|{q..........................p}]
// unexamined
// = = = = = = = = = = = = = = = = = = = = =
// partition loop
for (q = lo+1, p = hi; q <= p;)
{
// given: {q.............p}
while (*q < pivot)
if (++q > p) break; // inc left ==> [q........p}
while (*p >= pivot)
if (--p < q) break; // dec right [q...p] <==
if (q < p)
// swap: yields new {q...p}
tmp = *q, *q++ = *p, *p-- = tmp;
// = = = = = = = = = = = = = = = = = = = = =
// loop invariant:
//
// lo lo+1 hi
// [pivot| < pivot {q........p} >= pivot]
// smaller unexamined larger
// = = = = = = = = = = = = = = = = = = = = =
}
// = = = = = = = = = = = = = = = = = = = =
// loop exit:
//
// lo lo+1 p+1 hi
// [pivot| < pivot p q >= pivot]
// smaller ^ larger
// |
// insert pivot here (at p)
// = = = = = = = = = = = = = = = = = = = =
// move pivot (at lo) into final position (at p)
tmp = *lo, *lo = *p, *p = tmp;
// = = = = = = = = = = = = = = = = = = = =
// after placing pivot:
//
// lo p-1 p p+1 hi
// [{ < pivot } |pivot| { >= pivot }]
// new left new right
// = = = = = = = = = = = = = = = = = = = =
// stack larger section, iterate on smaller section
// ------------------------------------------------
if (p-lo > hi-p)
{
*++top = lo, *++top = p-1; // push [lo, p-1]
lo = p+1; // iterate [p+1, hi]
}
else
{
*++top = p+1, *++top = hi; // push [p+1, hi]
hi = p-1; // iterate [lo, p-1]
}
}
}

The trickiest part to implementing quicksort is to get the partitioning loop
correct. It's not that hard to get right, but it's very easy to get subtly wrong.
Once you have a proper working partitioning loop, the rest of the routine falls into
place naturally.

One particular sneeky item (in my version) is an optimization within the partitioning loop.
Just as the two out-of-order elements are found and are being swapped to different
sides, the q pointer is incremented and the p pointer decremented. This is a micro-optimization
that speeds up the loop just a bit, but is almost hidden because the inc/dec is performed
right there in the swap line. A bit too much code compression, perhaps, but I couldn't resist.

**So, what's better about it?**

Using the simple quicksort function given near the beginning of the article as a point
of reference, in what ways did my implementation make improvements? Well, there are several:

*** selection sort used for small sections**

The naturally recursive nature of quicksort, where each array is broken down into two sub-arrays,
is mostly a strength, but also a minor weakness of the algorithm. At a certain point,
when the sub-array sections become small enough, it becomes wasteful to further sub-divide them.
It's quicker to just run a simple-minded algorithm over the small section and be done with it.

Since it's hard to beat selection sort for sorting tiny arrays, my
quicksort uses it for sorting an array who size is below a certain threshold. What that threshold should be is
hard to say in general (and could, indeed, depend on several factors including the performance characterstics
of a particular computer). In my case, I simply tested various values and found that for a dozen or less items,
using selection sort improved the overall time.

*** no calls to rand()**

There are a number of methods to use for selecting the pivot
(use the first element, use the last, use the middle one, pick a random one, etc.), but I stuck with selecting a random item.
But, since the pivot selection is part of the critical partitioning code that is run repeatedly in an inner loop,
I decided that avoiding a function call to rand() would be beneficial. So, instead, I hash a random value by using the
values of the current first and last element of the array section. It seems to work quite well in practice.

*** recusion eliminated**

A standard optimization trick is to eliminate the recursion. The recursive calls are not too expensive,
and yet they are called so often within the inner loop, that getting rid of the calls definitely improves the speed.
Eliminating the recursion is done by keeping and maintaining a local stack.

This has an extra benefit too. In a worst case scenario -- having an already sorted array combined with selecting
the first array element as the pivot -- the run-time stack would need to be as large as the array itself.
This is a guaranteed stack overflow for big arrays! By managing a local stack, you can avoid this entirely
by pushing the bounds of the larger section on the stack and iterating on the smaller section.

We can even determine the maximum depth that the local stack can reach. Since the sort routine only compares integer
sized values at a time, there are only so many subdivisions possible. Specifically, for a 4-byte integer, which is 32 bits,
you can only subdivide up to 32 times. It's like a game of "guess the number" where someone tries to guess the value and
you say "higher" or "lower". In the case of 32-bit integers, after you've compared each bit, you're finished.
Another way of saying this is that for 32-bit integers, a binary tree could hold all possible 4GB values without needing
a depth of more than 32 levels.

Therefore, by iterating on the smaller sub-array sections, we minimize the depth needed for the stack. The worst case
scenario mentioned above leads to a 0-element left sub-array and (N-1)-element right sub-array.
The iteration on the 0-element section finishes immediately, and the bounds of the larger section are popped from the stack.
And then again, for the next pass, etc.
Thus, the stack never goes deeper than one level.
Oddly, what was the worst stack allocation scenario for standard function calls becomes the minimal stack requirements
for doing the {iterate-smaller, stack-larger} technique (altho it will still be the slowest scenario).

For this improved technique, the maximum stack depth occurs when the array is truly subdivided into basically equal parts each time
thru. Still, in this instance (which results in the fastest sort times), the depth can never go more than 32, the maximum
number of subdivisions for a 32-bit domain. Thus, the stack need not be larger than 32 elements. Actually, I only use one stack
for both lower and upper bounds (pushed/popped in pairs), so the stack is actually dimensioned as 64 ints.

*** check for already sorted sections**

The last optimization that I couldn't resist is checking to see if the sub-array being worked on is already in sorted order.
This completely removes the worst-case scenario for the standard implementation of quicksort, but at a small price.
Whenever the routine is sorting fairly random data, the extra checks for sorted order will be mostly wasted cycles.
However, the cost, judging by timing tests, seems quite small. So I leave it in,
knowing that it gives the function more predictable run times and completely avoids the horrid worst-case scenario.