Posts on Programming

The nature of the "this" pointer in C++


Posted by Diego Assencio on 2017.04.01 under Programming (C/C++)

Whenever you call a non-static member function of a class, you call it through an existing object of that class type. Inside the definition of such a member function, you can refer to this object through the this pointer. Unless there is a need to disambiguate the use of a certain variable name (for instance, if a class data member has the same name as a local variable of the member function), the this pointer is often not used by developers to explicitly refer to class data members. This is almost always not a problem, but as I will discuss in this post, there are situations which require special care in order to avoid certain pitfalls.

To start, consider the following piece of code:

class AddNumber
{
public:

	...

	int add(const int other) const;

private:
	int number;
};

int AddNumber::add(const int other) const
{
	return number + other;
}

When the compiler parses the code above, it will understand that on the definition of AddNumber::add, number refers to the class data member with that name, i.e., that the code above is equivalent to this:

class AddNumber
{
public:

	...

	int add(const int other) const;

private:
	int number;
};

int AddNumber::add(const int other) const
{
	return this->number + other;
}

However, if we change the name of the parameter other of AddNumber::add to number, the compiler will interpret any occurrence of number inside its definition as the function parameter number instead of the data member this->number:

class AddNumber
{
public:

	...

	int add(const int number) const;

private:
	int number;
};

int AddNumber::add(const int number) const
{
	return number + number; /* here number is not this->number! */
}

To fix this ambiguity, we can use the this pointer to indicate to the compiler that the first occurrence of number actually refers to the class data member instead of the function parameter:

class AddNumber
{
public:

	...

	int add(const int number) const;

private:
	int number;
};

int AddNumber::add(const int number) const
{
	return this->number + number; /* this is what we originally had */
}

I hope there was nothing new for you on everything discussed so far, so let's move on to more interesting things.

One could argue that classes as we see them don't really exist: they are purely syntactic sugar for avoiding having to explicitly pass object pointers around as we do in C programs. To clarify this idea, take a look at the code below: it is conceptually equivalent to the one above except for the absence of the private access specifier. To prevent any desperation in advance, the code below is not valid C++; its purpose is merely to illustrate the concepts we are about to discuss:

struct AddNumber
{
	...

	int number;
};

int AddNumber::add(const AddNumber* this, const int number)
{
	return this->number + number;
}

Why is the code above not valid? Well, for two reasons: AddNumber::add is not a valid function name in this context (it is not a member of AddNumber), and this, being a reserved keyword, cannot be used as a parameter name. While in the original version, AddNumber:add is called through an existing object of type AddNumber:

AddNumber my_adder;

...

my_adder.add(3);

in our (invalid) non-class version, AddNumber:add is called with an object as argument:

AddNumber my_adder;

...

AddNumber::add(&my_adder, 3);

Were it not invalid, the non-class version would do exactly the same as the original one. But in any case, it better represents how the compiler actually interprets things. Indeed, it makes it obvious that if we remove the this-> prefix from the first occurrence of number, we will end up with the problem discussed earlier: number will be interpreted exclusively as the function parameter. But don't take my word for it, see it for yourself:

struct AddNumber
{
	...

	int number;
};

int AddNumber::add(const AddNumber* this, const int number)
{
	return number + number; /* this pointer not used, return 2*number */
}

This brings us to the first lesson of this post: whenever you see a non-static member function, try to always read it as a stand-alone (i.e., non-member) function containing a parameter called this which is a pointer to the object the function is doing its work for.

One question which must be asked at this point is: what about static member functions? Do they also implicitly contain a this pointer? The answer is no, they don't. If they did, they would inevitably be associated with some existing object of the class, but static member functions, like static data members, belong to the class itself and can be invoked directly, i.e., without the need for an an existing class object. In this regard, a static member function is in no way special: the compiler will neither implicitly add a this parameter to its declaration nor introduce this-> prefixes anywhere on its definition.

Static member functions have, however, access to the internals of a class like any other member or friend function, provided it is given a pointer to a class object. This means the following code is valid:

class AddNumber
{
public:

	...

	static int add(const AddNumber* adder, const int number);

private:
	int number;
};

int AddNumber::add(const AddNumber* adder, const int number)
{
	return adder->number + number;
}

There is one type of situation in which the implicit presence of the this pointer on non-static member functions can cause a lot of headache to the innocent developer. Here it is, in its full "glory":

/* a global array of callable warning objects */
std::vector<std::function<void()>> warnings;

class WarningManager
{
public:

	...

	void add_warning(const std::string& message) const;

private:
	std::string name;
};

void WarningManager::add_warning(const std::string& message) const
{
	warnings.emplace_back([=]() {
		std::cout << name << ": " << message << "\n";
	});
}

The purpose of the code above is simple: WarningManager::add_warning populates the global array warnings with lambda functions which print some warning message when invoked. Regardless of how silly the purpose of this code may seem, scenarios like these do happen in practice. And being so, do you see what is the problem here?

If the problem is unclear to you, consider the advice given earlier: read the member function WarningManager::add_warning as a non-member function which takes a pointer called this to a WarningManager object:

/* a global array of callable warning objects */
std::vector<std::function<void()>> warnings;

struct WarningManager
{
	...

	std::string name;
};

void WarningManager::add_warning(const WarningManager* this,
                                 const std::string& message)
{
	warnings.emplace_back([=]() {
		std::cout << this->name << ": " << message << "\n";
	});
}

You may be puzzled with the fact that name on the original version of the code was replaced by this->name on the (remember, invalid) second version. Perhaps you are asking yourself: "isn't name itself actually copied by the capture list on the lambda function"? The answer is no. A "capture all by value" capture list (i.e., [=]) captures all non-static local variables which are visible in the scope where the lambda is created and nothing else. Function parameters fall into this category, but class data members don't. Therefore, the code above is conceptually identical to the following one:

/* a global array of callable warning objects */
std::vector<std::function<void()>> warnings;

struct WarningManager
{
	...

	std::string name;
};

void WarningManager::add_warning(const WarningManager* this,
                                 const std::string& message)
{
	warnings.emplace_back([this, message]() {
		std::cout << this->name << ": " << message << "\n";
	});
}

The problem is now easier to spot: in the original example, the name data member is not being captured directly by value, but is instead accessed through a copy of the this pointer to the WarningManager object for which WarningManager::add_warning is called. Since the lambda may be invoked at a point at which that object may no longer exist, the code above is a recipe for disaster. The lifetime of the lambda is independent from the lifetime of the WarningManager object which creates it, and the implicit replacement of name by this->name on the definition of the lambda means we can find ourselves debugging an obscure program crash.

A simple way to fix the problem just discussed is by being explicit about what we want: we want to capture name by value, so let's go ahead and make that very clear to everyone:

/* a global array of callable warning objects */
std::vector<std::function<void()>> warnings;

class WarningManager
{
public:

	...

	void add_warning(const std::string& message) const;

private:
	std::string name;
};

void WarningManager::add_warning(const std::string& message) const
{
	const std::string& manager_name = this->name;

	warnings.emplace_back([manager_name, message]() {
		std::cout << manager_name << ": " << message << "\n";
	});
}

Inside the capture list, the string this->name will be copied through its reference manager_name, and the lambda will therefore own a copy of this->name under the name manager_name. In C++14, this code can be simplified using the init capture capability which was added to lambda functions:

/* a global array of callable warning objects */
std::vector<std::function<void()>> warnings;

class WarningManager
{
public:

	...

	void add_warning(const std::string& message) const;

private:
	std::string name;
};

void WarningManager::add_warning(const std::string& message) const
{
	warnings.emplace_back([manager_name = this->name, message]() {
		std::cout << manager_name << ": " << message << "\n";
	});
}

In this case, we are explicitly coping this->name into a string called manager_name which is then accessible inside the lambda function. As discussed in a previous post, lambda functions are equivalent to functor classes, and in this case, manager_name is a data member of such a class which is initialized as a copy of this->name.

To close this post, I strongly recommend you read the Zen of Python. Look at the second guiding principle: "Explicit is better than implicit". After reading this post, I hope you can better appreciate what a wise statement that is! :-)

Comments (0) Direct link

How are virtual function table pointers initialized?


Posted by Diego Assencio on 2017.03.07 under Programming (C/C++)

A class declaring or inheriting at least one virtual function contains a virtual function table (or vtable, for short). Such a class is said to be a polymorphic class. An object of a polymorphic class type contains a special data member (a "vtable pointer") which points to the vtable of this class. This pointer is an implementation detail and cannot be accessed directly by the programmer (at least not without resorting to some low-level trick). In this post, I will assume the reader is familiar with vtables on at least a basic level (for the uninitiated, here is a good place to learn about this topic).

I hope you learned that when you wish to make use of polymorphism, you need to access objects of derived types through pointers or references to a base type. For example, consider the code below:

#include <iostream>

struct Fruit
{
	virtual const char* name() const
	{
		return "Fruit";
	}
};

struct Apple: public Fruit
{
	virtual const char* name() const override
	{
		return "Apple";
	}
};

struct Banana: public Fruit
{
	virtual const char* name() const override
	{
		return "Banana";
	}
};

void analyze_fruit(const Fruit& f)
{
	std::cout << f.name() << "\n";
}

int main()
{
	Apple a;
	Banana b;

	analyze_fruit(a);   /* prints "Apple" */
	analyze_fruit(b);   /* prints "Banana" */

	return 0;
}

So far, no surprises here. But what will happen if instead of taking a reference to a Fruit object on analyze_fruit, we take a Fruit object by value?

Any experienced C++ developer will immediately see the word "slicing" written in front of their eyes. Indeed, taking a Fruit object by value means that inside analyze_fruit, the object f is truly a Fruit, and never an Apple, a Banana or any other derived type:

/* same code as before... */

void analyze_fruit(Fruit f)
{
	std::cout << f.name() << "\n";
}

int main()
{
	Apple a;
	Banana b;

	analyze_fruit(a);   /* prints "Fruit" */
	analyze_fruit(b);   /* prints "Fruit" */

	return 0;
}

This situation is worth analyzing in further detail, even if it seems trivial at first. On the calls to analyze_fruit, we pass objects of type Apple and Banana as arguments which are used to initialize its parameter f (of type Fruit). This is a copy initialization, i.e., the initialization of f in both of these cases is no different from the way f is initialized on the code fragment below:

Apple a;
Fruit f(a);

Even though Fruit does not define a copy constructor, one is provided by the compiler. This default copy constructor merely copies each data member of the source Fruit object into the corresponding data member of the Fruit object being created. In our case, Fruit has no data members, but it still has a vtable pointer. How is this pointer initialized? Is it copied directly from the input Fruit object? Before we answer these questions, let us look at what the compiler-generated copy constructor of Fruit looks like:

struct Fruit
{
	/* compiler-generated copy constructor */
	Fruit(const Fruit& sf): vptr(/* what goes in here? */)
	{
		/* nothing happens here */
	}

	virtual const char* name() const
	{
		return "Fruit";
	}
};

The signature of the Fruit copy constructor shows that is takes a reference to a source Fruit object, which means if we pass an Apple object to the copy constructor of Fruit, the vtable pointer of sf (for "source fruit"), will really point to the vtable of an Apple object. In other words, if this vtable pointer is directly copied into the vtable pointer of the Fruit object being constructed (represented under the name vptr on the code above), this object will behave like an Apple whenever any of its virtual functions are called!

But as we mentioned on the second code example above (the one in which analyze_fruit takes a Fruit object by value), the Fruit parameter f always behaves as a Fruit, and never as an Apple or as a Banana.

This brings us to the main lesson of this post: vtable pointers are not common data members which are directly copied or moved by copy and move constructors respectively. Instead, they are always initialized by any constructor used to build an object of a polymorphic class type T with the address of the vtable for the T class. Also, assignment operators will never touch the values stored by vtable pointers. In the context of our classes, the vtable pointer of a Fruit object will be initialized by any constructor of Fruit with the address of the vtable for the Fruit class and will retain this value throughout the entire lifetime of the object.

Comments (0) Direct link

Avoid using floating-point numbers as hash table keys


Posted by Diego Assencio on 2017.02.11 under Programming (General)

Suppose you have an array of floating-point numbers and wish to count how many times each unique number occurs in the array. As an expert programmer, you may do as shown in the code below (important note: I will use Python to exemplify the concepts in this post, but everything mentioned here applies to associative arrays implemented as hash tables in any programming language, e.g. std::unordered_map in C++):

numbers = [ 1.4142, 2.7182, 2.7182, 3.1415, 2.7182, 1.4142 ]

# counters is a dictionary which maps each number in the numbers
# array to its number of occurrences
counters = { (x, numbers.count(x)) for x in numbers }

for (x, count) in counters:
    print("%f: %d" % (x, count))

The output of this program will not surprise you. The order of the lines below may be different on your system, but the overall resuls should be the same:

1.414200: 2
3.141500: 1
2.718200: 3

Everything is fine, right? Unfortunately, no. A Python dictionary is implemented as a hash table. Hash tables are commonly implemented as follows: to insert a pair (k,v) into the table, where k is a key and v is its associated value, we first hash k to find a bucket into which (k,v) must be inserted, then we insert (k,v) into this bucket; if a pair with key k already exists in the bucket, it is replaced by this new pair. But why is this a problem? The following example will illustrate it for you:

a = 0.123456
b = 0.567890

# don't be fooled: numbers != [ a, b, a, b ]
numbers = [ a, b, (a/b)*b, (b/a)*a ]

counters = { (x, numbers.count(x)) for x in numbers }

for (x, count) in counters:
    print("%f: %d" % (x, count))

The output now is not what we expect:

0.123456: 1
0.123456: 1
0.567890: 2

What went wrong here? To find out, let us modify the code above to get additional information on what is happening:

a = 0.123456
b = 0.567890

# don't be fooled: numbers != [ a, b, a, b ]
numbers = [ a, b, (a/b)*b, (b/a)*a ]

print("numbers = %s\n" % numbers)

counters = { (x, numbers.count(x)) for x in numbers }

for (x, count) in counters:
    print("%f: %d" % (x, count))

I hope you will now be able to see what the problem is:

numbers = [0.123456, 0.56789, 0.12345599999999998, 0.56789]

0.123456: 1
0.123456: 1
0.567890: 2

Aha! It is floating-point arithmetic that has bitten our feet. As you likely know, naively checking whether floating-point numbers are equal, i.e., comparing them as a == b, is a very bad idea because such numbers are stored with finite precision. Also, arithmetic operations involving floating-point numbers are carried out with finite precision as well. Indeed, given two nonzero floating-point numbers a and b, there is no guarantee that a == (a/b)*b. In general, when you hash two different floating-point numbers, you will, with very high probability, obtain two different hash values even if these numbers are very close to each other. If that happens, the numbers will, again with high probability, be placed in different buckets of a hash table.

On the example above, we were lucky to have (b/a)*a being equal to b, but not so lucky with (a/b)*b and a: these two are unfortunately not equal to each other and are therefore treated as distinct numbers by the dictionary (i.e., by its underlying hash table).

At this point, it should be clear that hashing floating-point numbers is as dangerous as directly checking for their equality using the == operator. Even tiny differences in their values may throw them into different buckets of an associated hash table.

If you really need to use floating-point numbers as keys for a hash table, and depending on the numbers you will be hashing, you may be able to alleviate the problems described above with some special trick. As an example, if all numbers fall within the interval $[0,1)$, and if two significant figures are enough to represent these numbers accurately, we can truncate each number to its first two decimal digits before passing it to the hash function: this will effectively divide the interval $[0,1)$ into $100$ intervals of length $0.01$, with all numbers in the same interval being considered equal. Here is the code which implements this idea:

# keep only the first two decimal digits of x for x in [0,1)
def truncate(x):
	return int(x / 0.01) * 0.01

a = 0.123456
b = 0.567890

# truncate each number before adding it to the numbers array
numbers = [ truncate(x) for x in [ a, b, (a/b)*b, (b/a)*a ] ]

print("numbers = %s\n" % numbers)

counters = { (x, numbers.count(x)) for x in numbers }

for (x, count) in counters:
    print("%f: %d" % (x, count))

The output of this program shows it works as we wish it to:

numbers = [0.12, 0.56, 0.12, 0.56]

0.560000: 2
0.120000: 2

Notice that now both b and (b/a)*a are no longer considered to be two distinct numbers by the hash function since they are truncated before being passed to it, and the same remains true for a and (a/b)*a.

However, as I said above, the problem is only alleviated by the trick just presented. Indeed, the original problem (numbers extremely close to each other being considered distinct by the hash function) has not been entirely addressed: we divided the interval $[0,1)$ into intervals of width $0.01$ over which all numbers are considered equal, but on the shared boundaries of each interval, i.e., for numbers in the form $0.01N$ for $N = 1, 2, \ldots, 99$, the problem still persists since minor deviations around these numbers will fall into different intervals. As a concrete example, for a very small $\delta \gt 0$, $0.50 - \delta$ will fall within $[0.49, 0.50)$ while $0.50 + \delta$ will fall within $[0.50, 0.51)$; therefore, these two numbers will be treated differently by the hash function even though they are very close to each other.

To summarize: creating hash tables using floating-point numbers as keys is a tricky task as hashing them is similar to comparing them for equality using the == operator, and the result may cause your application to behave in unexpected ways. My recommendation: avoid doing this, if you can.

Comments (0) Direct link