The nature of the "this" pointer in C++


Posted by Diego Assencio on 2017.04.01 under Programming (C/C++)

Whenever you call a non-static member function of a class, you call it through an existing object of that class type. Inside the definition of such a member function, you can refer to this object through the this pointer. Unless there is a need to disambiguate the use of a certain variable name (for instance, if a class data member has the same name as a local variable of the member function), the this pointer is often not used by developers to explicitly refer to class data members. This is almost always not a problem, but as I will discuss in this post, there are situations which require special care in order to avoid certain pitfalls.

To start, consider the following piece of code:

class AddNumber
{
public:

	...

	int add(const int other) const;

private:
	int number;
};

int AddNumber::add(const int other) const
{
	return number + other;
}

When the compiler parses the code above, it will understand that on the definition of AddNumber::add, number refers to the class data member with that name, i.e., that the code above is equivalent to this:

class AddNumber
{
public:

	...

	int add(const int other) const;

private:
	int number;
};

int AddNumber::add(const int other) const
{
	return this->number + other;
}

However, if we change the name of the parameter other of AddNumber::add to number, the compiler will interpret any occurrence of number inside its definition as the function parameter number instead of the data member this->number:

class AddNumber
{
public:

	...

	int add(const int number) const;

private:
	int number;
};

int AddNumber::add(const int number) const
{
	return number + number; /* here number is not this->number! */
}

To fix this ambiguity, we can use the this pointer to indicate to the compiler that the first occurrence of number actually refers to the class data member instead of the function parameter:

class AddNumber
{
public:

	...

	int add(const int number) const;

private:
	int number;
};

int AddNumber::add(const int number) const
{
	return this->number + number; /* this is what we originally had */
}

I hope there was nothing new for you on everything discussed so far, so let's move on to more interesting things.

One could argue that classes as we see them don't really exist: they are purely syntactic sugar for avoiding having to explicitly pass object pointers around as we do in C programs. To clarify this idea, take a look at the code below: it is conceptually equivalent to the one above except for the absence of the private access specifier. To prevent any desperation in advance, the code below is not valid C++; its purpose is merely to illustrate the concepts we are about to discuss:

struct AddNumber
{
	...

	int number;
};

int AddNumber::add(const AddNumber* this, const int number)
{
	return this->number + number;
}

Why is the code above not valid? Well, for two reasons: AddNumber::add is not a valid function name in this context (it is not a member of AddNumber), and this, being a reserved keyword, cannot be used as a parameter name. While in the original version, AddNumber:add is called through an existing object of type AddNumber:

AddNumber my_adder;

...

my_adder.add(3);

in our (invalid) non-class version, AddNumber:add is called with an object as argument:

AddNumber my_adder;

...

AddNumber::add(&my_adder, 3);

Were it not invalid, the non-class version would do exactly the same as the original one. But in any case, it better represents how the compiler actually interprets things. Indeed, it makes it obvious that if we remove the this-> prefix from the first occurrence of number, we will end up with the problem discussed earlier: number will be interpreted exclusively as the function parameter. But don't take my word for it, see it for yourself:

struct AddNumber
{
	...

	int number;
};

int AddNumber::add(const AddNumber* this, const int number)
{
	return number + number; /* this pointer not used, return 2*number */
}

This brings us to the first lesson of this post: whenever you see a non-static member function, try to always read it as a stand-alone (i.e., non-member) function containing a parameter called this which is a pointer to the object the function is doing its work for.

One question which must be asked at this point is: what about static member functions? Do they also implicitly contain a this pointer? The answer is no, they don't. If they did, they would inevitably be associated with some existing object of the class, but static member functions, like static data members, belong to the class itself and can be invoked directly, i.e., without the need for an an existing class object. In this regard, a static member function is in no way special: the compiler will neither implicitly add a this parameter to its declaration nor introduce this-> prefixes anywhere on its definition.

Static member functions have, however, access to the internals of a class like any other member or friend function, provided it is given a pointer to a class object. This means the following code is valid:

class AddNumber
{
public:

	...

	static int add(const AddNumber* adder, const int number);

private:
	int number;
};

int AddNumber::add(const AddNumber* adder, const int number)
{
	return adder->number + number;
}

There is one type of situation in which the implicit presence of the this pointer on non-static member functions can cause a lot of headache to the innocent developer. Here it is, in its full "glory":

/* a global array of callable warning objects */
std::vector<std::function<void()>> warnings;

class WarningManager
{
public:

	...

	void add_warning(const std::string& message) const;

private:
	std::string name;
};

void WarningManager::add_warning(const std::string& message) const
{
	warnings.emplace_back([=]() {
		std::cout << name << ": " << message << "\n";
	});
}

The purpose of the code above is simple: WarningManager::add_warning populates the global array warnings with lambda functions which print some warning message when invoked. Regardless of how silly the purpose of this code may seem, scenarios like these do happen in practice. And being so, do you see what is the problem here?

If the problem is unclear to you, consider the advice given earlier: read the member function WarningManager::add_warning as a non-member function which takes a pointer called this to a WarningManager object:

/* a global array of callable warning objects */
std::vector<std::function<void()>> warnings;

struct WarningManager
{
	...

	std::string name;
};

void WarningManager::add_warning(const WarningManager* this,
                                 const std::string& message)
{
	warnings.emplace_back([=]() {
		std::cout << this->name << ": " << message << "\n";
	});
}

You may be puzzled with the fact that name on the original version of the code was replaced by this->name on the (remember, invalid) second version. Perhaps you are asking yourself: "isn't name itself actually copied by the capture list on the lambda function"? The answer is no. A "capture all by value" capture list (i.e., [=]) captures all non-static local variables which are visible in the scope where the lambda is created and nothing else. Function parameters fall into this category, but class data members don't. Therefore, the code above is conceptually identical to the following one:

/* a global array of callable warning objects */
std::vector<std::function<void()>> warnings;

struct WarningManager
{
	...

	std::string name;
};

void WarningManager::add_warning(const WarningManager* this,
                                 const std::string& message)
{
	warnings.emplace_back([this, message]() {
		std::cout << this->name << ": " << message << "\n";
	});
}

The problem is now easier to spot: in the original example, the name data member is not being captured directly by value, but is instead accessed through a copy of the this pointer to the WarningManager object for which WarningManager::add_warning is called. Since the lambda may be invoked at a point at which that object may no longer exist, the code above is a recipe for disaster. The lifetime of the lambda is independent from the lifetime of the WarningManager object which creates it, and the implicit replacement of name by this->name on the definition of the lambda means we can find ourselves debugging an obscure program crash.

A simple way to fix the problem just discussed is by being explicit about what we want: we want to capture name by value, so let's go ahead and make that very clear to everyone:

/* a global array of callable warning objects */
std::vector<std::function<void()>> warnings;

class WarningManager
{
public:

	...

	void add_warning(const std::string& message) const;

private:
	std::string name;
};

void WarningManager::add_warning(const std::string& message) const
{
	const std::string& manager_name = this->name;

	warnings.emplace_back([manager_name, message]() {
		std::cout << manager_name << ": " << message << "\n";
	});
}

Inside the capture list, the string this->name will be copied through its reference manager_name, and the lambda will therefore own a copy of this->name under the name manager_name. In C++14, this code can be simplified using the init capture capability which was added to lambda functions:

/* a global array of callable warning objects */
std::vector<std::function<void()>> warnings;

class WarningManager
{
public:

	...

	void add_warning(const std::string& message) const;

private:
	std::string name;
};

void WarningManager::add_warning(const std::string& message) const
{
	warnings.emplace_back([manager_name = this->name, message]() {
		std::cout << manager_name << ": " << message << "\n";
	});
}

In this case, we are explicitly coping this->name into a string called manager_name which is then accessible inside the lambda function. As discussed in a previous post, lambda functions are equivalent to functor classes, and in this case, manager_name is a data member of such a class which is initialized as a copy of this->name.

To close this post, I strongly recommend you read the Zen of Python. Look at the second guiding principle: "Explicit is better than implicit". After reading this post, I hope you can better appreciate what a wise statement that is! :-)

Comments

No comments posted yet.

Leave a reply

NOTE: A name and a comment (max. 1024 characters) must be provided; all other fields are optional. Equations will be processed if surrounded with dollar signs (as in LaTeX). You can post up to 5 comments per day.