Using Ref-qualifiers

Most C++ programmers who have drank the C++11 koolaid are already familiar with rvalue notation. Namely, specifying that a particular function overload acts on rvalues. For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
struct foo {
    void bar(int &) {
        std::cout << "l";
    }
    void bar(int &&) {
        std::cout << "r";
    }
};

int main() {
    int i = 1;
    foo f;
    f.bar(i);
    f.bar(3);
}

will output “lr”. There is a variation of this overload that applies to the object itself known as the ref-qualifier.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
struct foo {
    void bar() & {
        std::cout << "l";
    }
    void bar() && {
        std::cout << "r";
    }
};

int main() {
    foo f;
    f.bar();
    foo{}.bar();
}

This will also output “lr” because the compiler recognizes that f is an lvalue but foo{} is an rvalue (specifically, an expiring value). So now that we know what ref-qualifiers are, why should we use them?

Traversing a DOM like structure

The example I will discussing is a Selector object that can traverse the DOM. I chose the DOM because it is a fairly well-known construct in the programming world. For the uninitiated, the DOM is essentially a tree with named nodes. For the sake of this example, we’ll assume that nodes of a particular name are unique (only ids exist; no classes). Also, we’ll assume that everything is a div for simplicity although this can easily be extended to handle different types of elements. For example:

1
2
3
4
5
6
<div id="1">
  <div id="2a">
    <div id="3"></div>
  </div>
  <div id="2b"></div>
</div>

The structure of this document looks like

    1
   / \
  2a  2b
 /
3

One way of allowing a user to traverse this is by chaining names. For example:

1
2
3
Selector s(document);
s["1"]["2a"]["3"] = "hello"; // access div with id "3" in div "2a"
s["1"]["2b"] = "world";

Of course, we’d also want to allow the user to cache a selector for faster access.

1
2
auto root = s["1"];
root["2a"] = "foo";

This interface is friendly and adheres to the principle of least astonishment (especially because it resembles common javascript APIs). An important constraint for the problem here is that we are not allowed to store pointers within the tree. This may seem like an artificial constraint. However, this sort of thing could happen if the DOM (or whatever it is) is in some other black box module that doesn’t permit such references (due to mutating state and what have you).

A basic implementation might look like

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
class Selector {
private:
    DOM &_doc;
    std::vector<std::string> _traversal;

    void _traverse() {
        for (const auto &elem : _traversal) {
            _doc.Get(elem);
        }
    }

public:
    Selector(DOM &doc) : _doc(doc) {}
    Selector operator[](const std::string &id) const {
        Selector s(*this);
        s._traversal.push_back(id);
        return s;
    }
    void operator=(const std::string &value) {
        _traverse();
        _doc.Set(value);
    }
};

This implementation assumes methods DOM::Get(std::string &id) and DOM::Set(std::string &value). To traverse the DOM, we store a vector of ids that we can traverse with Selector::_traverse(). Note that the way we allow the operator to be chained is by returning new copies of Selector. If we modified it, the user wouldn’t be able to cache a selector chain and would have to rebuild the vector of ids each time.

The problem

Obviously, the implementation as is isn’t ideal. To see this, consider the following chain:

1
2
Selector one = s["1"];
one["2"]["3"]["4"] = 3;

Here, the user caches a selector to the “1”-id div. Afterwards, the user attempts to traverse from “1” to “4” and passes through two intermediate nodes. The intermediate selections will generate unecessary copies of Selector that get thrown away immediately! Of course, we could decide to have operator[] modify the Selector in place and forbid caching. But this could create weird side-effects that would be exposed to the user.

1
2
3
Selector s{document};
s["1"]["2a"] = "hi";
s["1"]["2b"] = "there";

Here, it is clear what the user is trying to do, but after the first traversal, s has now been mutated so the _traversal vector now contains “1” and “2a”, and the second invocation will fail. Sure, we could provide the user with a means to “reset” the selector but this is also error prone and doesn’t make for a good experience.

A better way… with ref-qualifiers!

It turns out, that the information we really need whether or not the selector we are applying operator[] to is something the user wants to save, or something that will be thrown away. But this is precisely what ref-qualifiers allow us to do!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class Selector {
private:
    DOM &_doc;
    std::vector<std::string> _traversal;

    void _traverse() {
        for (const auto &elem : _traversal) {
            _doc.Get(elem);
        }
    }

public:
    Selector(DOM &doc) : _doc(doc) {}

    // This is new
    Selector operator[](const std::string &id) const & {
        Selector s(*this);
        s._traversal.push_back(id);
        return s;
    }
    Selector &&operator[](const std::string &id) && {
        _traversal.push_back(id);
        return std::move(*this);
    }
    // End new stuff

    void operator=(const std::string &value) {
        _traverse();
        _doc.Set(value);
    }
};

Here is the new implementation. We now have now overloaded operator[] to handle the cases when *this refers to an lvalue and an rvalue respectively and changed nothing else.

Now, when we revisit the snippet from before…

1
2
Selector one = s["1"];
one["2"]["3"]["4"] = 3;

the evaluation is completely different. The variable one is bound to a name and is an lvalue. So one["2"] will produce a new temporary selector using the operator[](const std::string &) const & overload. The next chain, however, applies to a temporary so the operator[](const std::string &) && overload is called instead and modifies the temporary Selector instead. This proceeds all the way to the end and we get the desired result.

Use of std::move

My first implementation of this code by hand actually had a bug in it! Namely, the code

1
2
3
4
Selector &operator[](const std::string &id) && {
    _traversal.push_back(id);
    return *this;
}

doesn’t return an rvalue but an lvalue. This is because even though *this referred to an rvalue, once it entered the scope of this function, it got a name, and thus became an rvalue. This happens with function arguments too:

1
2
3
4
5
6
7
void foo(int &) {
    std::cout << "l" << std::endl;
}
void foo(int &&i) {
    std::cout << "r" << std::endl;
    foo(i);
}

Calling foo(3) here will output “rl” because the second overload of foo is reached first, but the value 3 is then bound to i and so becomes an lvalue. Thanks to Serge Pavlovsky for pointing out my error!

Conclusion

So now we’ve had our cake and eaten it too. In general, having the ability to distinguish between an unnamed temporary (rvalue) or an lvalue permits the programmer to exploit optimizations such as this one (performing operations in place vs creating a pristine “workplace”). This is a common theme with value categories in general so be on the lookout for such optimizations where they make sense.

This technique came into play when implementing something similar to the DOM selector here in a project called Selene. You can see the actual code in the file include/Selector.h.

Comments