Sorting C++ arrays of C-style strings


Posted by Diego Assencio on 2016.09.02 under Programming (C/C++)

Suppose you have an array of C-style strings, i.e., an array of char* or const char*, and wish to sort the array by ordering the strings in lexicographical order. One could naively think that the following would work (but it will not):

#include <vector>
#include <iostream>
#include <algorithm>

int main()
{
	std::vector<const char*> names = { "Mary", "Bob", "John", "Roy" };

	std::sort(names.begin(), names.end());

	for (const char* name : names)
	{
		std::cout << name << " ";
	}
	std::cout << std::endl;

	return 0;
}

Unfortunately, this will print the names in the original order, which is not what we desire:

Mary Bob John Roy

This happens because a C-style string is represented as a pointer to its first character (const char* above). By naively sorting the elements of names, we end up comparing pointer values which define the memory addresses at which our strings are located, but not the strings themselves. In other words, we compare const char* values using the < operator since that is the comparison method which std::sort uses by default. The addresses of the strings in names are determined at compile time, and in this case the compiler decided to place "Mary" at an address which is smaller than the address of "Bob", which in turn was placed at an address smaller than the address of "John" and so on.

So how do we achieve what we want? The easiest way is by providing a comparison method for std::sort which orders the C-style strings lexicographically instead of by memory addresses:

#include <vector>
#include <iostream>
#include <cstring>
#include <algorithm>

int main()
{
	std::vector<const char*> names = { "Mary", "Bob", "John", "Roy" };

	/* compares C-style strings lexicographically, not by address */
	auto cstr_compare = [](const char* s1, const char* s2) {
		return strcmp(s1,s2) < 0;
	};

	std::sort(names.begin(), names.end(), cstr_compare);

	for (const char* name : names)
	{
		std::cout << name << " ";
	}
	std::cout << std::endl;

	return 0;
}

The output is now what we wish it to be:

Bob John Mary Roy

Our solution is simple: use strcmp to compare strings lexicographically: the expression strcmp(s1,s2) < 0 evaluates to true if and only if s1 is lexicographically smaller than s2.

Bonus: ordered associative containers of C-style strings

Ordered associative containers such as std::map and std::set usually store their elements in balanced binary search trees which use the ordering criterion for the keys to perform fast insertion and lookup. However, as we just saw above, the ordering criterion for const char* does not take the contents of the string it points to into account, so we again need to provide a comparison method which enforces lexicographical order on the container elements:

#include <set>
#include <iostream>
#include <cstring>

struct NameCompare
{
	bool operator()(const char* s1, const char* s2) const
	{
		return strcmp(s1,s2) < 0;
	}
};

int main()
{
	/* not what we want: strings ordered by memory addresses */
	std::set<const char*> names_1 = { "Mary", "Bob", "John", "Roy" };

	std::cout << "names_1: ";
	for (const char* name : names_1)
	{
		std::cout << name << " ";
	}
	std::cout << std::endl;

	/* use NameCompare as an ordering criterion for the names */
	std::set<const char*, NameCompare> names_2 = { "Mary", "Bob",
	                                               "John", "Roy" };

	std::cout << "names_2: ";
	for (const char* name : names_2)
	{
		std::cout << name << " ";
	}
	std::cout << std::endl;

	return 0;
}

The output of this program is:

names_1: Mary Bob John Roy
names_2: Bob John Mary Roy

When iterating over the elements of an std::set or std::map using a range-based for loop, the elements of the container are traversed according to the ordering criterion for their keys, from "smallest" to "largest". In the example above, we can see that the names in names_1 are printed in the order of their memory addresses, but the names in names_2 are printed in lexicographical order since that is the order enforced by NameCompare on C-style strings.

For the curious, the default ordering criterion for std::set and std::map is defined by the function object std::less, which defines an order for a type T using the < operator for this type:

template<typename T>
struct less
{
	bool operator() (const T& x, const T& y)
	{
		return x < y;
	}
};

Not surprisingly, std::less is what std::sort uses by default to compare elements.

Comments

No comments posted yet.

Leave a reply

NOTE: A name and a comment (max. 1024 characters) must be provided; all other fields are optional. Equations will be processed if surrounded with dollar signs (as in LaTeX). You can post up to 5 comments per day.