## Run-time limits of comparison-based sorting algorithms

Posted by Diego Assencio on 2017.10.01 under Computer science (Algorithms)

Let $T(n)$ be the run-time of some comparison-based sorting algorithm for a sequence of length $n$. By "comparison-based", we mean a sorting algorithm which accesses input array elements only via comparisons as is the case for general-purpose sorting algorithms such as merge sort, quicksort, and heapsort. Such an algorithm does not know in advance which values the input array can store (as is for instance the case with bucket sort), but simply orders them using a predefined "less than" comparison function. In this post, we will show that regardless of how well designed an algorithm of this type is, its worst-case running time cannot be faster than $c n\log_2(n)$ as $n$ becomes large (for some constant $c \gt 0$). In more technical words, there exist constants $n_0 \geq 0$ and $c \gt 0$ such that $T(n) \geq cn\log_2(n)$ for $n \geq n_0$, and therefore $T(n) = \Omega(n\log_2(n))$.

This post will only deal with deterministic algorithms, but the results extend to randomized algorithms as well. The main idea of the proof is to consider only input arrays which are permutations of $S_n = \{1, 2, \ldots, n\}$. The total number of distinct arrays which represent permutations of $S_n$ is $n!$.

For a given deterministic, comparison-based sorting algorithm, let $k$ be the largest number of comparisons within which it can sort any of the $n!$ arrays mentioned above. Because the execution of the algorithm is based on the results of the value comparisons it performs, there are at most $2^k$ possible distinct execution paths for it. To clarify, the algorithm sorts an input array by repeatedly comparing pairs of values and deciding on how to proceed depending on which value is greater according to the comparison function. Each comparison generates two possible branches of execution, and since we know the total number of comparisons performed cannot exceed $k$, there can be at most $2^k$ distinct execution paths for the arrays we are considering.

Suppose now that $2^k \lt n!$. If that is the case, then the number of possible execution paths of the algorithm is smaller than the total number of input arrays we are considering. Therefore, there must be at least two distinct input arrays for which the algorithm executes following the exact same path (this is referred to as the pigeonhole principle in mathematics). This implies that when the algorithm is applied to these arrays, the same transformations (e.g. elements swaps) are applied to each of them, with the final result being the same: the values of $S_n$ ordered according to the predefined comparison function. This is not possible since the algorithm is deterministic: by inverting its steps from the final (sorted) result, we cannot arrive at two distinct initial inputs. Therefore $2^k \geq n!$.

The final step in the proof needs a simple fact about $n!$: $$\displaystyle n! = n(n-1)(n-2)\ldots\frac{n}{2}\left(\frac{n}{2} - 1\right) \ldots1 \geq \left(\frac{n}{2}\right)^{\frac{n}{2}}$$ Hence, since $2^k \geq n!$: $$\log_2(2^k) \geq \log_2(n!) \Longrightarrow k \geq \frac{n}{2}\log\left(\frac{n}{2}\right) \Longrightarrow k = \Omega(n\log_2(n))$$ Therefore, the best lower bound for the worst-case running time of the sorting algorithm is $T(n) = \Omega(n\log_2(n))$.