Home Articles 2016/01/12

C++ Function Dispatch Under The Hood

Introduction

When working with lambdas, functor classes, and plain 'ol functions, you may want to pass them to other functions. It wouldn't be C++ if there weren't at least two ways of doing this, and as it turns out, there are three: your venerable C-style function pointers, template type deduction, and std::function. We will investigate the latter two approaches in moderate detail; function pointers are relatively straightforward and cannot be used for lambdas or functors, so they are less interesting.

The Code

We have three callable things: a function, a functor class, and a lambda, and two functions which can take a callable thing: template_caller and function_caller.

 1 #include<functional>
 2 
 3 int function(int a) {
 4   return a + 3;
 5 }
 6 
 7 class Functor {
 8   public:
 9     Functor(const int x):m_x(x) {}
10 
11     int operator()(int a) {
12       return a + m_x;
13     }
14 
15   private:
16     int m_x;
17 };
18 
19 template<class Function>
20 int template_caller(Function f, int arg) {
21   return f(arg);
22 }
23 
24 int function_caller(std::function<int(int)> f, int arg) {
25   return f(arg);
26 }
27 
28 int main() {
29   int x = 3;
30   volatile int y;
31 
32   Functor functor(x);
33   auto lambda = [=] (int a) { return a + x; };
34 
35   y = template_caller(function, 5);
36   y = template_caller(functor, 5);
37   y = template_caller(lambda, 5);
38 
39   y = function_caller(function, 5);
40   y = function_caller(functor, 5);
41   y = function_caller(lambda, 5);
42 
43   return 0;
44 }

This code, along with a makefile to generate disassemblies, are available here. The disassembly shown here was generated with GCC 6.2.1 with no optimizations and the -g flag enabled.

Template Type Deduction

As we expect, C++ produces three instantiations of template_caller, one for each type of callable thing. Let's see how the types and assembly differ for each instance.

Function:

First, let's see the place template_caller is called in main() so we can see how the arguments are passed:

 197   y = template_caller(function, 5);
 198   4007f0: be 05 00 00 00        mov    esi,0x5
 199   4007f5: bf 76 07 40 00        mov    edi,0x400776
 200   4007fa: e8 9a 05 00 00        call   400d99 <int template_caller<int (*)(int)>(int (*)(int), int)>
 201   4007ff: 89 85 7c ff ff ff     mov    DWORD PTR [rbp-0x84],eax

If you guessed that 0x400776 is the address of int function(int a), you get a cookie. This function follows the standard calling convention: first argument in edi, second argument in esi. (It's interesting to note that, despite being compiled for a 64-bit system, the function pointer is stored in edi and not rdi as you would expect.)

Here is the template instantiation of template_caller that is called:

 835 0000000000400d99 <int template_caller<int (*)(int)>(int (*)(int), int)>:
 836 int template_caller(Function f, int arg) {
 837   400d99: 55                    push   rbp
 838   400d9a: 48 89 e5              mov    rbp,rsp
 839   400d9d: 48 83 ec 10           sub    rsp,0x10
 840   400da1: 48 89 7d f8           mov    QWORD PTR [rbp-0x8],rdi
 841   400da5: 89 75 f4              mov    DWORD PTR [rbp-0xc],esi
 842   return f(arg);
 843   400da8: 8b 55 f4              mov    edx,DWORD PTR [rbp-0xc]
 844   400dab: 48 8b 45 f8           mov    rax,QWORD PTR [rbp-0x8]
 845   400daf: 89 d7                 mov    edi,edx
 846   400db1: ff d0                 call   rax
 847 }
 848   400db3: c9                    leave
 849   400db4: c3                    ret

Note the type of the template parameter: int (*)(int)—a C-style function pointer. These are lightweight, but do not carry any sort of object along with them and thus cannot be used for functors or lambdas.

The code of template_caller first sets up the registers to call the function: arg is moved from esi to edi by lines 841,843, and 845. Then, it places f, the address of the function to be called, in rax via lines 840 and 844. Finally, line 846 calls the function; since we just return the value of the called function, there's nothing else to do. (Note that line 840 uses rdi, as opposed to main() using edi.)

Functor:

Again, let's first look at the call site in main():

 202   y = template_caller(functor, 5);
 203   400805: 8b 85 70 ff ff ff     mov    eax,DWORD PTR [rbp-0x90]
 204   40080b: be 05 00 00 00        mov    esi,0x5
 205   400810: 89 c7                 mov    edi,eax
 206   400812: e8 9e 05 00 00        call   400db5 <int template_caller<Functor>(Functor, int)>
 207   400817: 89 85 7c ff ff ff     mov    DWORD PTR [rbp-0x84],eax

Naturally rbp-0x90 is the address of the functor object declared earlier in main(). Note that line 203 copies the object itself into eax, not a pointer to the object. (This is possible because Functor consists of just one int, which easily fits in a register.) The call is very similar to the call for a plain function, except the first parameter is an object instead of a function pointer.

 851 0000000000400db5 <int template_caller<Functor>(Functor, int)>:
 852 int template_caller(Function f, int arg) {
 853   400db5: 55                    push   rbp
 854   400db6: 48 89 e5              mov    rbp,rsp
 855   400db9: 48 83 ec 10           sub    rsp,0x10
 856   400dbd: 89 7d f0              mov    DWORD PTR [rbp-0x10],edi
 857   400dc0: 89 75 fc              mov    DWORD PTR [rbp-0x4],esi
 858   return f(arg);
 859   400dc3: 8b 55 fc              mov    edx,DWORD PTR [rbp-0x4]
 860   400dc6: 48 8d 45 f0           lea    rax,[rbp-0x10]
 861   400dca: 89 d6                 mov    esi,edx
 862   400dcc: 48 89 c7              mov    rdi,rax
 863   400dcf: e8 38 ff ff ff        call   400d0c <Functor::operator()(int)>
 864 }
 865   400dd4: c9                    leave
 866   400dd5: c3                    ret

Here, we have an instance of template_caller for the type Functor. Notice that the two parameters to this function are a Functor object in edi and an int argument in esi. Our call to Functor::operator() expects a pointer to a Functor object, so lines 856, 860, and 862 copy the object to template_caller's stack frame and place that address in rdi. Lines 857, 859, and 861 move esi to esi (wow, so efficient!). Last, but not least, line 863 directly calls operator().

Lambda:

Finally, let's examine how template_caller works when passed a lambda. The call site is exactly the same as for the functor (rbp-0xa0 is the lambda object's address):

 208   y = template_caller(lambda, 5);
 209   40081d: 8b 85 60 ff ff ff     mov    eax,DWORD PTR [rbp-0xa0]
 210   400823: be 05 00 00 00        mov    esi,0x5
 211   400828: 89 c7                 mov    edi,eax
 212   40082a: e8 07 01 00 00        call   400936 <int template_caller<main::{lambda(int)#1}>(main::{lambda(int)#1}, int)>
 213   40082f: 89 85 7c ff ff ff     mov    DWORD PTR [rbp-0x84],eax

And the instance of template_caller itself:

 289 0000000000400936 <int template_caller<main::{lambda(int)#1}>(main::{lambda(int)#1}, int)>:
 290 int template_caller(Function f, int arg) {
 291   400936: 55                    push   rbp
 292   400937: 48 89 e5              mov    rbp,rsp
 293   40093a: 48 83 ec 10           sub    rsp,0x10
 294   40093e: 89 7d f0              mov    DWORD PTR [rbp-0x10],edi
 295   400941: 89 75 fc              mov    DWORD PTR [rbp-0x4],esi
 296   return f(arg);
 297   400944: 8b 55 fc              mov    edx,DWORD PTR [rbp-0x4]
 298   400947: 48 8d 45 f0           lea    rax,[rbp-0x10]
 299   40094b: 89 d6                 mov    esi,edx
 300   40094d: 48 89 c7              mov    rdi,rax
 301   400950: e8 53 fe ff ff        call   4007a8 <main::{lambda(int)#1}::operator()(int) const>
 302 }
 303   400955: c9                    leave
 304   400956: c3                    ret

The type parameter here is something unusual: main::{lambda(int)#1}, which clearly isn't valid C++ syntax—you couldn't explicitly write this type in a program (which is why we rely on auto to deduce the type for us). This is the type of the functor class GCC generates on-the-fly when you make a lambda, which is why lambdas and functors look so similar behind the scenes.

Again, this code is identical to that generated for the Functor instance of template_caller, except the function is hard-wired to call our lambda's operator().

Summary:

Template type deduction is a low-overhead approach to passing callable things to functions in C++. In the case of functors and lambdas, the associated template instance can directly call the appropriate operator(); plain functions require one memory read to read the address of the function to call.

Passing each callable thing to template_caller generates a different template instantiation, since all three callables have different types. Passing a plain function produces a template instance for a function pointer; this template instance will be used for every plain function of the same type (taking one int and returning an int). Passing a functor or lambda produces a template instance for that particular functor class or lambda.

Functors and lambdas can't reuse template instantiations for different lambdas or functor classes because the template instance must hard-code the operator() to be called, unlike with function pointers, where the function to be called is a parameter to template_caller. To achieve such reuse, we need to introduce another layer of indirection. std::function implements this indirection with a virtual operator() function.

Using std::function

Hold on to your hats; we're about to disassemble some of the C++ standard library!

Since std::function gives a uniform means to refer to callable things, function_caller is not templated, and thus the analysis of how function_caller works will not depend on whether it's calling a function, a functor, or a lambda. The only difference between the different calls is what is passed in main(), so let's start there. For each call to function_caller, we should expect three function calls:

  1. std::function's constructor, to construct the object passed to function_caller
  2. function_caller
  3. std::function's destructor
We'll examine the first two; the destructor is relatively uninteresting.

Plain Function:

Constructor:

 215   y = function_caller(function, 5);
 216   400839: 48 8d 45 80           lea    rax,[rbp-0x80]
 217   40083d: be 76 07 40 00        mov    esi,0x400776
 218   400842: 48 89 c7              mov    rdi,rax
 219   400845: e8 c8 05 00 00        call   400e12 <std::function<int (int)>::function<int (*)(int), void, void>(int (*)(int))>

Here the constructor is passed the address (rbp-0x80) of a new object to construct in rdi and the address of function, 0x400776 in esi. The constructor does some preliminary checks, but the fun starts on line 928, where we see the start of a call to initialize a _My_handler object, which is specialized to call the callable thing std::function is boxing (in this case, a function pointer).

 907 0000000000400e12 <std::function<int (int)>::function<int (*)(int), void, void>(int (*)(int))>:
 908       function<_Res(_ArgTypes...)>::
 909   400e12: 55                    push   rbp
 910   400e13: 48 89 e5              mov    rbp,rsp
 911   400e16: 53                    push   rbx
 912   400e17: 48 83 ec 18           sub    rsp,0x18
 913   400e1b: 48 89 7d e8           mov    QWORD PTR [rbp-0x18],rdi
 914   400e1f: 48 89 75 e0           mov    QWORD PTR [rbp-0x20],rsi
 ...
 928       _My_handler::_M_init_functor(_M_functor, std::move(__f));
 929   400e4b: 48 8d 45 e0           lea    rax,[rbp-0x20]
 930   400e4f: 48 89 c7              mov    rdi,rax
####                                        rdi -> rax
 931   400e52: e8 ac ff ff ff        call   400e03 <std::remove_reference<int (*&)(int)>::type&& std::move<int (*&)(int)>(int (*&)(int))>
 932   400e57: 48 89 c2              mov    rdx,rax
 933   400e5a: 48 8b 45 e8           mov    rax,QWORD PTR [rbp-0x18]
 934   400e5e: 48 89 d6              mov    rsi,rdx
 935   400e61: 48 89 c7              mov    rdi,rax
 936   400e64: e8 ef 00 00 00        call   400f58 <std::_Function_base::_Base_manager<int (*)(int)>::_M_init_functor(std::_Any_data&, int (*&&)(int))>

First, note that this and the address of function are placed on the stack at rbp-0x18 and rbp-0x20, respectively.

Lines 929 and 930 put a pointer to the pointer to function in rdi. std::remove_reference (called on line 931) has the effect of moving rdi to rax (it does some type-level manipulation too, but that doesn't appear in the assembly). That pointer (holding the value rbp-0x20) makes its way to rsi via lines 932 and 934.

Line 935 puts this into rdi; line 936 calls the handler's init function. That init function exists only to perform some template metaprogramming which results in a call to the real init function at 0x4011aa with the same parameters. Let's see what happens in there:

1290 00000000004011aa <std::_Function_base::_Base_manager<int (*)(int)>::_M_init_functor(std::_Any_data&, int (*&&)(int), std::integral_constant<bool, true>)>:
1291   _M_init_functor(_Any_data& __functor, _Functor&& __f, true_type)
1292   4011aa: 55                    push   rbp
1293   4011ab: 48 89 e5              mov    rbp,rsp
1294   4011ae: 53                    push   rbx
1295   4011af: 48 83 ec 18           sub    rsp,0x18
1296   4011b3: 48 89 7d e8           mov    QWORD PTR [rbp-0x18],rdi
1297   4011b7: 48 89 75 e0           mov    QWORD PTR [rbp-0x20],rsi
1298   { new (__functor._M_access()) _Functor(std::move(__f)); }
1299   4011bb: 48 8b 45 e0           mov    rax,QWORD PTR [rbp-0x20]
1300   4011bf: 48 89 c7              mov    rdi,rax
1301   4011c2: e8 3c fc ff ff        call   400e03 <std::remove_reference<int (*&)(int)>::type&& std::move<int (*&)(int)>(int (*&)(int))>
####                                        rdi -> rax
1302   4011c7: 48 8b 18              mov    rbx,QWORD PTR [rax]
1303   4011ca: 48 8b 45 e8           mov    rax,QWORD PTR [rbp-0x18]
1304   4011ce: 48 89 c7              mov    rdi,rax
####                                        rdi -> rax
1305   4011d1: e8 a0 fa ff ff        call   400c76 <std::_Any_data::_M_access()>
1306   4011d6: 48 89 c6              mov    rsi,rax
1307   4011d9: bf 08 00 00 00        mov    edi,0x8
####                                        rsi -> rax
1308   4011de: e8 81 fa ff ff        call   400c64 <operator new(unsigned long, void*)>
1309   4011e3: 48 85 c0              test   rax,rax
1310   4011e6: 74 03                 je     4011eb <std::_Function_base::_Base_manager<int (*)(int)>::_M_init_functor(std::_Any_data&, int (*&&)(int), std::integral_constant<bool, true>)+0x41>
1311   4011e8: 48 89 18              mov    QWORD PTR [rax],rbx
1312   4011eb: 90                    nop
1313   4011ec: 48 83 c4 18           add    rsp,0x18
1314   4011f0: 5b                    pop    rbx
1315   4011f1: 5d                    pop    rbp
1316   4011f2: c3                    ret

  1. The pointer to the pointer to function is dereferenced and placed in rbx via lines 1297 and 1299–1302.
  2. this is placed in rax via lines 1296 and 1303–1307.
  3. The pointer to function is stored in *this on line 1311.
So, the net result is that the 'first' thing in our std::function object is a pointer to the function we want to call. This finishes up the call to initialize _My_handler.

The constructor for std::function does one other important thing: it sets up a couple vtable entries.

 937       _M_invoker = &_My_handler::_M_invoke;
 938   400e69: 48 8b 45 e8           mov    rax,QWORD PTR [rbp-0x18]
 939   400e6d: 48 c7 40 18 97 0f 40  mov    QWORD PTR [rax+0x18],0x400f97
 940   400e74: 00
 941       _M_manager = &_My_handler::_M_manager;
 942   400e75: 48 8b 45 e8           mov    rax,QWORD PTR [rbp-0x18]
 943   400e79: 48 c7 40 10 d0 0f 40  mov    QWORD PTR [rax+0x10],0x400fd0
 944   400e80: 00
 945       }

The vtable entries are created by storing function addresses in the object we're constructing. What's important to note here is that this+0x18 holds the address of _M_invoke, 0x400f97. We'll need to know this when analyzing std::function::operator().

Calling function_caller:

Back in main(), we see that a pointer to the std::function object is passed in rdi, and arg's value, 5, is passed in esi:

 220   40084a: 48 8d 45 80           lea    rax,[rbp-0x80]
 221   40084e: be 05 00 00 00        mov    esi,0x5
 222   400853: 48 89 c7              mov    rdi,rax
 223   400856: e8 2a ff ff ff        call   400785 <function_caller(std::function<int (int)>, int)>

function_caller just calls std::function::operator() with the same arguments in the same registers:

 138 0000000000400785 <function_caller(std::function<int (int)>, int)>:
 144 int function_caller(std::function<int(int)> f, int arg) {
 145   400785: 55                    push   rbp
 146   400786: 48 89 e5              mov    rbp,rsp
 147   400789: 48 83 ec 10           sub    rsp,0x10
 148   40078d: 48 89 7d f8           mov    QWORD PTR [rbp-0x8],rdi
 149   400791: 89 75 f4              mov    DWORD PTR [rbp-0xc],esi
 150   return f(arg);
 151   400794: 8b 55 f4              mov    edx,DWORD PTR [rbp-0xc]
 152   400797: 48 8b 45 f8           mov    rax,QWORD PTR [rbp-0x8]
 153   40079b: 89 d6                 mov    esi,edx
 154   40079d: 48 89 c7              mov    rdi,rax
 155   4007a0: e8 a9 05 00 00        call   400d4e <std::function<int (int)>::operator()(int) const>
 156 }
 157   4007a5: c9                    leave
 158   4007a6: c3                    ret

The indirection fun starts in std::function::operator():

 796 0000000000400d4e <std::function<int (int)>::operator()(int) const>:
 801   400d4e: 55                    push   rbp
 802   400d4f: 48 89 e5              mov    rbp,rsp
 803   400d52: 53                    push   rbx
 804   400d53: 48 83 ec 18           sub    rsp,0x18
 805   400d57: 48 89 7d e8           mov    QWORD PTR [rbp-0x18],rdi
 806   400d5b: 89 75 e4              mov    DWORD PTR [rbp-0x1c],esi
 ...
 818   400d73: 48 8b 45 e8           mov    rax,QWORD PTR [rbp-0x18]
 819   400d77: 48 8b 58 18           mov    rbx,QWORD PTR [rax+0x18]
 820   400d7b: 48 8d 45 e4           lea    rax,[rbp-0x1c]
 821   400d7f: 48 89 c7              mov    rdi,rax
####                                        rdi -> rax
 822   400d82: e8 b8 ff ff ff        call   400d3f <int&& std::forward<int>(std::remove_reference<int>::type&)>
 823   400d87: 48 89 c2              mov    rdx,rax
 824   400d8a: 48 8b 45 e8           mov    rax,QWORD PTR [rbp-0x18]
 825   400d8e: 48 89 d6              mov    rsi,rdx
 826   400d91: 48 89 c7              mov    rdi,rax
 827   400d94: ff d3                 call   rbx
 828     }
 829   400d96: 48 83 c4 18           add    rsp,0x18
 830   400d9a: 5b                    pop    rbx
 831   400d9b: 5d                    pop    rbp
 832   400d9c: c3                    ret

  1. The vtable entry for _M_invoke (0x400f97) is loaded into rbx by lines 805, 818, and 819.
  2. A pointer to arg is placed in rsi via lines 806, 820–823, and 825.
  3. this is stored in rdi via lines 805, 824, and 826.
  4. _M_invoke's vtable entry is called on line 827.

It is the responsibility of _M_invoke to interpret what is stored in the std::function object and how to call it:

1064 0000000000400f97 <std::_Function_handler<int (int), int (*)(int)>::_M_invoke(std::_Any_data const&, int&&)>:
1065       _M_invoke(const _Any_data& __functor, _ArgTypes&&... __args)
1066   400f97: 55                    push   rbp
1067   400f98: 48 89 e5              mov    rbp,rsp
1068   400f9b: 53                    push   rbx
1069   400f9c: 48 83 ec 18           sub    rsp,0x18
1070   400fa0: 48 89 7d e8           mov    QWORD PTR [rbp-0x18],rdi
1071   400fa4: 48 89 75 e0           mov    QWORD PTR [rbp-0x20],rsi
1072   return (*_Base::_M_get_pointer(__functor))(
1073   400fa8: 48 8b 45 e8           mov    rax,QWORD PTR [rbp-0x18]
1074   400fac: 48 89 c7              mov    rdi,rax
####                                        rdi -> rax (via 4 function calls!)
1075   400faf: e8 3f 02 00 00        call   4011f3 <std::_Function_base::_Base_manager<int (*)(int)>::_M_get_pointer(std::_Any_data const&)>
1076   400fb4: 48 8b 18              mov    rbx,QWORD PTR [rax]
1077       std::forward<_ArgTypes>(__args)...);
1078   400fb7: 48 8b 45 e0           mov    rax,QWORD PTR [rbp-0x20]
1079   400fbb: 48 89 c7              mov    rdi,rax
####                                        rdi -> rax
1080   400fbe: e8 7c fd ff ff        call   400d3f <int&& std::forward<int>(std::remove_reference<int>::type&)>
1081   return (*_Base::_M_get_pointer(__functor))(
1082   400fc3: 8b 00                 mov    eax,DWORD PTR [rax]
1083       std::forward<_ArgTypes>(__args)...);
1084   400fc5: 89 c7                 mov    edi,eax
1085   400fc7: ff d3                 call   rbx
1086       }
1087   400fc9: 48 83 c4 18           add    rsp,0x18
1088   400fcd: 5b                    pop    rbx
1089   400fce: 5d                    pop    rbp
1090   400fcf: c3                    ret

  1. The pointer to function stored in this is moved to rbx by lines 1070 and 1073–1076.
  2. The pointer to arg is moved to rax by lines 1071 and 1078–1080.
  3. arg is fetched from the pointer and placed in edi by lines 1082 and 1084.
  4. Finally, function is called on line 1085.

Summary:

When wrapping a function pointer, std::function is implemented as a virtual class whose operator() calls a virtual function on a handler class that then calls the wrapped function pointer. Constructing this object requires storing both the function pointer and a couple vtable entries.

Functor or Lambda:

We will study how the behavior of std::function changes when created from a functor or lambda, rather than a function pointer. Unsurprisingly, the code generated is functionally equivalent for functors and lambdas. The code shown is for a Functor instance.

Constructor:

 228   y = function_caller(functor, 5);
 229   40086d: 8b 95 70 ff ff ff     mov    edx,DWORD PTR [rbp-0x90]
 230   400873: 48 8d 45 a0           lea    rax,[rbp-0x60]
 231   400877: 89 d6                 mov    esi,edx
 232   400879: 48 89 c7              mov    rdi,rax
 233   40087c: e8 33 06 00 00        call   400eb4 <std::function<int (int)>::function<Functor, void, void>(Functor)>

As before, this is passed in rdi. Our functor object, being one int, is passed in esi.

 973 0000000000400eb4 <std::function<int (int)>::function<Functor, void, void>(Functor)>:
 974       function<_Res(_ArgTypes...)>::
 975   400eb4: 55                    push   rbp
 976   400eb5: 48 89 e5              mov    rbp,rsp
 977   400eb8: 53                    push   rbx
 978   400eb9: 48 83 ec 18           sub    rsp,0x18
 979   400ebd: 48 89 7d e8           mov    QWORD PTR [rbp-0x18],rdi
 980   400ec1: 89 75 e0              mov    DWORD PTR [rbp-0x20],esi
 ...
 994       _My_handler::_M_init_functor(_M_functor, std::move(__f));
 995   400eec: 48 8d 45 e0           lea    rax,[rbp-0x20]
 996   400ef0: 48 89 c7              mov    rdi,rax
####                                        rdi -> rax
 997   400ef3: e8 ad ff ff ff        call   400ea5 <std::remove_reference<Functor&>::type&& std::move<Functor&>(Functor&)>
 998   400ef8: 48 89 c2              mov    rdx,rax
 999   400efb: 48 8b 45 e8           mov    rax,QWORD PTR [rbp-0x18]
1000   400eff: 48 89 d6              mov    rsi,rdx
1001   400f02: 48 89 c7              mov    rdi,rax
1002   400f05: e8 7d 01 00 00        call   401087 <std::_Function_base::_Base_manager<Functor>::_M_init_functor(std::_Any_data&, Functor&&)>

  1. A pointer to functor is placed in rsi via lines 980, 995–998, and 1000.
  2. this is placed in rdi, as usual, by 999 and 1001.
  3. Then, we call the handler's init function on 1002, which again forwards the call to the true handler init:

1409 00000000004012b6 <std::_Function_base::_Base_manager<Functor>::_M_init_functor(std::_Any_data&, Functor&&, std::integral_constant<bool, true>)>:
1410   _M_init_functor(_Any_data& __functor, _Functor&& __f, true_type)
1411   4012b6: 55                    push   rbp
1412   4012b7: 48 89 e5              mov    rbp,rsp
1413   4012ba: 53                    push   rbx
1414   4012bb: 48 83 ec 18           sub    rsp,0x18
1415   4012bf: 48 89 7d e8           mov    QWORD PTR [rbp-0x18],rdi
1416   4012c3: 48 89 75 e0           mov    QWORD PTR [rbp-0x20],rsi
1417   { new (__functor._M_access()) _Functor(std::move(__f)); }
1418   4012c7: 48 8b 45 e0           mov    rax,QWORD PTR [rbp-0x20]
1419   4012cb: 48 89 c7              mov    rdi,rax
####                                        rdi -> rax
1420   4012ce: e8 d2 fb ff ff        call   400ea5 <std::remove_reference<Functor&>::type&& std::move<Functor&>(Functor&)>
1421   4012d3: 48 89 c3              mov    rbx,rax
1422   4012d6: 48 8b 45 e8           mov    rax,QWORD PTR [rbp-0x18]
1423   4012da: 48 89 c7              mov    rdi,rax
####                                        rdi -> rax
1424   4012dd: e8 94 f9 ff ff        call   400c76 <std::_Any_data::_M_access()>
1425   4012e2: 48 89 c6              mov    rsi,rax
1426   4012e5: bf 04 00 00 00        mov    edi,0x4
####                                        rsi -> rax
1427   4012ea: e8 75 f9 ff ff        call   400c64 <operator new(unsigned long, void*)>
1428   4012ef: 48 85 c0              test   rax,rax
1429   4012f2: 74 04                 je     4012f8 <std::_Function_base::_Base_manager<Functor>::_M_init_functor(std::_Any_data&, Functor&&, std::integral_constant<bool, true>)+0x42>
1430   4012f4: 8b 13                 mov    edx,DWORD PTR [rbx]
1431   4012f6: 89 10                 mov    DWORD PTR [rax],edx
1432   4012f8: 90                    nop
1433   4012f9: 48 83 c4 18           add    rsp,0x18
1434   4012fd: 5b                    pop    rbx
1435   4012fe: 5d                    pop    rbp
1436   4012ff: c3                    ret

  1. The pointer to functor is placed in rbx, via lines 1416 and 1418–1421.
  2. this is copied to rax by lines 1415 and 1422–1427.
  3. Finally, the pointer to functor is dereferenced and functor is copied into the std::function object by lines 1430 and 1431.

Back in the constructor for std::function, the vtable is set up:

1003       _M_invoker = &_My_handler::_M_invoke;
1004   400f0a: 48 8b 45 e8           mov    rax,QWORD PTR [rbp-0x18]
1005   400f0e: 48 c7 40 18 c6 10 40  mov    QWORD PTR [rax+0x18],0x4010c6
1006   400f15: 00
1007       _M_manager = &_My_handler::_M_manager;
1008   400f16: 48 8b 45 e8           mov    rax,QWORD PTR [rbp-0x18]
1009   400f1a: 48 c7 40 10 02 11 40  mov    QWORD PTR [rax+0x10],0x401102
1010   400f21: 00
1011       }

Note that this time we have a different invoke function, 0x4010c6.

Call:

The call to function_caller is the same as with the function pointer (as it should be—this is the entire point of std::function).

  1. The pointer to the std::function object is passed in rdi and arg is passed in esi.
  2. std::function::operator() places a copy of arg on its stack frame, then passes a pointer to that in rsi to _M_invoke (along with this in rdi).

1195 00000000004010c6 <std::_Function_handler<int (int), Functor>::_M_invoke(std::_Any_data const&, int&&)>:
1196       _M_invoke(const _Any_data& __functor, _ArgTypes&&... __args)
1197   4010c6: 55                    push   rbp
1198   4010c7: 48 89 e5              mov    rbp,rsp
1199   4010ca: 53                    push   rbx
1200   4010cb: 48 83 ec 18           sub    rsp,0x18
1201   4010cf: 48 89 7d e8           mov    QWORD PTR [rbp-0x18],rdi
1202   4010d3: 48 89 75 e0           mov    QWORD PTR [rbp-0x20],rsi
1203       std::forward<_ArgTypes>(__args)...);
1204   4010d7: 48 8b 45 e0           mov    rax,QWORD PTR [rbp-0x20]
1205   4010db: 48 89 c7              mov    rdi,rax
####                                        rdi -> rax
1206   4010de: e8 5c fc ff ff        call   400d3f <int&& std::forward<int>(std::remove_reference<int>::type&)>
1207   return (*_Base::_M_get_pointer(__functor))(
1208   4010e3: 8b 18                 mov    ebx,DWORD PTR [rax]
1209   4010e5: 48 8b 45 e8           mov    rax,QWORD PTR [rbp-0x18]
1210   4010e9: 48 89 c7              mov    rdi,rax
####                                        rdi -> rax (via 4 function calls!)
1211   4010ec: e8 0f 02 00 00        call   401300 <std::_Function_base::_Base_manager<Functor>::_M_get_pointer(std::_Any_data const&)>
1212       std::forward<_ArgTypes>(__args)...);
1213   4010f1: 89 de                 mov    esi,ebx
1214   4010f3: 48 89 c7              mov    rdi,rax
1215   4010f6: e8 15 fc ff ff        call   400d10 <Functor::operator()(int)>
1216       }
1217   4010fb: 48 83 c4 18           add    rsp,0x18
1218   4010ff: 5b                    pop    rbx
1219   401100: 5d                    pop    rbp
1220   401101: c3                    ret

This invocation function differs from the invoker for a function pointer in two ways. First, we need to manage both the functor object as well as the argument being passed to Functor::operator(). Second, we haven't saved the address of Functor::operator() anywhere, so we will need to directly call it.

  1. The this pointer for the functor object is placed in rdi via lines 1201, 1209–1211, and 1214. Notice that since the functor object is the first thing in the std::function object, the same this pointer is used for both.
  2. The arg pointer is dereferenced and its value placed in esi via lines 1202, 1204–1208, and 1213.
  3. Functor::operator() is called directly on line 1215.

Summary:

The std::function object for a functor object or lambda consists of a copy of the functor/lambda object and a vtable, which contains a pointer to an invocation function. The invocation function, unlike that for a function pointer, has hard-coded the address of the functor or lambda's operator(). Thus, each functor class and lambda requires a corresponding handler class and invocation function when boxed with std::function.

Summary:

std::function provides a uniform 'box' for passing around and calling any sort of callable object. This allows developers to write code without using templates or restricting the sort of callable things that can be passed to functions. std::function achieves this uniformity by performing dynamic function dispatch using virtual functions. When a std::function object is created, it contains either a function pointer or a functor/lambda object, as well as a vtable with a pointer to an invocation function that knows how to call the callable thing inside the std::function object.

As with the template type deduction approach, the invocation function is specialized to either a function pointer of a given type or to a specific functor class or lambda. While every function pointer of the same type can share one invocation function, the invocation function for functor classes and lambdas must have hard-coded the address of the appropriate operator() to call and thus cannot be reused between functor classes or lambdas.

The process for dynamically dispatching the function call is as follows:

  1. Call std::function::operator().
  2. Fetch and call the virtual invocation function.
  3. Call the appropriate function, via either a pointer or a direct call to operator().

Template metaprogramming functions are scattered throughout the code for std::function. It would be reasonable to be concerned about the additional overhead from these (effectively) no-op functions, but enabling -O1 or -Og eliminate all this code.

Design Takeaways

Both approaches to passing callable things to functions require one template instance per functor class or lambda object. In the case of directly using template type deduction, the function taking a callable as a parameter is specialized; in the case of boxing the callable in std::function, the handler class is specialized. As such, neither approach clearly offers smaller code size.

However, std::function definitely incurs one vtable lookup and function dispatch that cannot be eliminated, even by an optimizer. As a result of this, std::function will always incur some function call overhead, and it will require memory overhead (at least two pointers, or 16 bytes on a 64-bit system) for storing the vtable. By comparison, directly using template type deduction and passing a lambda or functor class produces assembly that is much easier to optimize. It is common in this case for the lambda/functor's operator() to be inlined into the template instance. This is possible since the template instance directly calls it, as opposed to calling via a pointer as would be the case for a virtual function dispatch or a function pointer. Furthermore, in the case that the function being passed a callable is small, the template instantiation may itself be inlined into the call site, resulting in a net reduction of two function calls.

Therefore, we recommend that std::function be used only when dynamic dispatch is strictly necessary. Using template type deduction for functions that take a callable thing allows the caller of that function to choose whether they want static or dynamic dispatch, since they can always just pass a std::function to the templated function. For an example of how this would work as a member function of a templated class, consider the following simplified Vector implementation:

 1 #include<vector>
 2 #include<iostream>
 3 
 4 template<class Item>
 5 class Vector {
 6   public:
 7     Vector(const unsigned int size) : v(size) {}
 8 
 9     unsigned int size() { return v.size(); }
10 
11     Item& operator[](const unsigned int idx) {
12       return v[idx];
13     }
14 
15     template<class Func>
16     Vector map(Func f) {
17       Vector ret(size());
18 
19       for(unsigned int i = 0; i < size(); i++) {
20         ret[i] = f(v[i]);
21       }
22 
23       return ret;
24     }
25 
26   private:
27     std::vector<Item> v;
28 };
29 
30 int main() {
31   Vector<int> nums(5);
32   for(int i = 0; i < 5; i++) {
33     nums[i] = i;
34   }
35 
36   Vector<int> doubled = nums.map([] (int n) { return n * 2; });
37 
38   for(int i = 0; i < 5; i++) {
39     std::cout << nums[i] << " * 2 = " << doubled[i] << std::endl;
40   }
41 
42   return 0;
43 }

(Note: this map() function requires that the function passed to it map Item to Item. For a more flexible but less clear example, see this Vector implementation.)

Unfortunately, C++ cannot perform template type deduction on return values, so this approach cannot be used to return lambdas from functions. However, C++14 introduces the ability to deduce the return type of a function using the auto keyword. This can be used to deduce both functor classes and on-the-fly generated lambda classes. For example:

1 auto get_func() {
2   return [] (int x) { return x + 3; };
3 }

This is not such a large disadvantage, though, because quite often returning a callable thing from a function is a situation in which you need to perform dynamic dispatch anyway. Consider this slightly more complicated example:

1 std::function<int(int)> halve_or_triple(int n) {
2   if(n%2 == 0) {
3     return [] (int x) { return x/2; };
4   }
5   else {
6     return [] (int x) { return 3*x + 1; };
7   }
8 }

Here there is no way to specialize the return type of this function to one lambda or the other at compile time—the exact type of the object returned from this function can only be determined at the time it is called. (Even C++14's auto cannot deduce the return type for you here.) Therefore, we must dynamically dispatch this function call.

Conclusion

C++ offers two techniques for writing functions that take a callable thing as a parameter: template type deduction and std::function. The template type deduction approach performs static function dispatch; that is, the function to be called is determined at compile time. This approach allows for aggressive optimization and low overhead in terms of both memory consumption and CPU instructions for performing the call. However, in situations where the function to be called cannot be determined at compile time, an additional level of indirection is needed.

std::function offers dynamic function dispatch via a virtual function call. This comes at a cost, however: at least two virtual table entries must be stored for each std::function instance, and one vtable lookup is performed each time an instance's operator() is called. Dynamic dispatch is often necessary when returning callable things from functions. It is also essential should one wish to store callable things to be called at a later time, such as when registering callbacks. The trade-off between static and dynamic dispatch is analogous to the trade-off between static polymorphism (using CRTP) and dynamic polymorphism (using virtual functions).

As such, we conclude that the proper design for a function taking a callable thing should, if at all possible, allow the caller to determine whether they desire static or dynamic dispatch. Since the template type deduction approach can create a template instantiation for std::function, taking this approach permits the caller to pass either a functor or lambda directly or to pass a std::function instance should they require dynamic dispatch.

Homework

  1. How much of std::function's indirection is eliminated via optimization?
  2. How aggressively do different optimization levels inline function calls when passing callable objects? Does calling the same function, but passing different callable objects each time, affect the inlining?
  3. Does Clang generate significantly different code than GCC?
  4. It is possible to write a functor class with a recursive operator(). How would one write a recursive lambda where the recursion call is dynamically dispatched? Is it possible to write a recursive lambda with a statically dispatched recursion call?

Further Reading