When working with lambdas, functor classes, and plain 'ol functions, you may want to pass them to other functions.
It wouldn't be C++ if there weren't at least two ways of doing this, and as it turns out, there are three: your venerable
C-style function pointers, template type deduction, and
std::function
.
We will investigate the latter two approaches in moderate detail; function pointers are relatively straightforward and cannot be used
for lambdas or functors, so they are less interesting.
We have three callable things: a function, a functor class, and a lambda, and two functions which can take a callable thing:
template_caller
and function_caller
.
1 #include<functional> 2 3 int function(int a) { 4 return a + 3; 5 } 6 7 class Functor { 8 public: 9 Functor(const int x):m_x(x) {} 10 11 int operator()(int a) { 12 return a + m_x; 13 } 14 15 private: 16 int m_x; 17 }; 18 19 template<class Function> 20 int template_caller(Function f, int arg) { 21 return f(arg); 22 } 23 24 int function_caller(std::function<int(int)> f, int arg) { 25 return f(arg); 26 } 27 28 int main() { 29 int x = 3; 30 volatile int y; 31 32 Functor functor(x); 33 auto lambda = [=] (int a) { return a + x; }; 34 35 y = template_caller(function, 5); 36 y = template_caller(functor, 5); 37 y = template_caller(lambda, 5); 38 39 y = function_caller(function, 5); 40 y = function_caller(functor, 5); 41 y = function_caller(lambda, 5); 42 43 return 0; 44 }
This code, along with a makefile to generate disassemblies, are available here.
The disassembly shown here was generated with GCC 6.2.1 with no optimizations and the -g
flag enabled.
template_caller
, one for each type of callable thing.
Let's see how the types and assembly differ for each instance.
First, let's see the place template_caller
is called in main()
so we can see how the arguments are passed:
197 y = template_caller(function, 5); 198 4007f0: be 05 00 00 00 mov esi,0x5 199 4007f5: bf 76 07 40 00 mov edi,0x400776 200 4007fa: e8 9a 05 00 00 call 400d99 <int template_caller<int (*)(int)>(int (*)(int), int)> 201 4007ff: 89 85 7c ff ff ff mov DWORD PTR [rbp-0x84],eax
If you guessed that 0x400776
is the address of int function(int a)
, you get a cookie.
This function follows the standard calling convention: first argument in edi
, second argument in esi
.
(It's interesting to note that, despite being compiled for a 64-bit system, the function pointer is stored in edi
and
not rdi
as you would expect.)
Here is the template instantiation of template_caller
that is called:
835 0000000000400d99 <int template_caller<int (*)(int)>(int (*)(int), int)>: 836 int template_caller(Function f, int arg) { 837 400d99: 55 push rbp 838 400d9a: 48 89 e5 mov rbp,rsp 839 400d9d: 48 83 ec 10 sub rsp,0x10 840 400da1: 48 89 7d f8 mov QWORD PTR [rbp-0x8],rdi 841 400da5: 89 75 f4 mov DWORD PTR [rbp-0xc],esi 842 return f(arg); 843 400da8: 8b 55 f4 mov edx,DWORD PTR [rbp-0xc] 844 400dab: 48 8b 45 f8 mov rax,QWORD PTR [rbp-0x8] 845 400daf: 89 d7 mov edi,edx 846 400db1: ff d0 call rax 847 } 848 400db3: c9 leave 849 400db4: c3 ret
Note the type of the template parameter: int (*)(int)
—a C-style function pointer.
These are lightweight, but do not carry any sort of object along with them and thus cannot be used for functors or lambdas.
The code of template_caller
first sets up the registers to call the function: arg
is moved from esi
to edi
by lines 841,843, and 845. Then, it places f
, the address of the function to be called, in rax
via lines 840 and 844.
Finally, line 846 calls the function; since we just return the value of the called function, there's nothing else to do.
(Note that line 840 uses rdi
, as opposed to main()
using edi
.)
Again, let's first look at the call site in main()
:
202 y = template_caller(functor, 5); 203 400805: 8b 85 70 ff ff ff mov eax,DWORD PTR [rbp-0x90] 204 40080b: be 05 00 00 00 mov esi,0x5 205 400810: 89 c7 mov edi,eax 206 400812: e8 9e 05 00 00 call 400db5 <int template_caller<Functor>(Functor, int)> 207 400817: 89 85 7c ff ff ff mov DWORD PTR [rbp-0x84],eax
Naturally rbp-0x90
is the address of the functor
object declared earlier in main()
.
Note that line 203 copies the object itself into eax
, not a pointer to the object. (This is possible because
Functor
consists of just one int
, which easily fits in a register.)
The call is very similar to the call for a plain function, except the first parameter is an object instead of a function pointer.
851 0000000000400db5 <int template_caller<Functor>(Functor, int)>: 852 int template_caller(Function f, int arg) { 853 400db5: 55 push rbp 854 400db6: 48 89 e5 mov rbp,rsp 855 400db9: 48 83 ec 10 sub rsp,0x10 856 400dbd: 89 7d f0 mov DWORD PTR [rbp-0x10],edi 857 400dc0: 89 75 fc mov DWORD PTR [rbp-0x4],esi 858 return f(arg); 859 400dc3: 8b 55 fc mov edx,DWORD PTR [rbp-0x4] 860 400dc6: 48 8d 45 f0 lea rax,[rbp-0x10] 861 400dca: 89 d6 mov esi,edx 862 400dcc: 48 89 c7 mov rdi,rax 863 400dcf: e8 38 ff ff ff call 400d0c <Functor::operator()(int)> 864 } 865 400dd4: c9 leave 866 400dd5: c3 ret
Here, we have an instance of template_caller
for the type Functor
.
Notice that the two parameters to this function are a Functor
object in edi
and an int
argument in esi
.
Our call to Functor::operator()
expects a pointer to a Functor
object, so lines 856, 860, and 862 copy the object to
template_caller
's stack frame and place that address in rdi
.
Lines 857, 859, and 861 move esi
to esi
(wow, so efficient!).
Last, but not least, line 863 directly calls operator()
.
Finally, let's examine how template_caller
works when passed a lambda.
The call site is exactly the same as for the functor (rbp-0xa0
is the lambda object's address):
208 y = template_caller(lambda, 5); 209 40081d: 8b 85 60 ff ff ff mov eax,DWORD PTR [rbp-0xa0] 210 400823: be 05 00 00 00 mov esi,0x5 211 400828: 89 c7 mov edi,eax 212 40082a: e8 07 01 00 00 call 400936 <int template_caller<main::{lambda(int)#1}>(main::{lambda(int)#1}, int)> 213 40082f: 89 85 7c ff ff ff mov DWORD PTR [rbp-0x84],eax
And the instance of template_caller
itself:
289 0000000000400936 <int template_caller<main::{lambda(int)#1}>(main::{lambda(int)#1}, int)>: 290 int template_caller(Function f, int arg) { 291 400936: 55 push rbp 292 400937: 48 89 e5 mov rbp,rsp 293 40093a: 48 83 ec 10 sub rsp,0x10 294 40093e: 89 7d f0 mov DWORD PTR [rbp-0x10],edi 295 400941: 89 75 fc mov DWORD PTR [rbp-0x4],esi 296 return f(arg); 297 400944: 8b 55 fc mov edx,DWORD PTR [rbp-0x4] 298 400947: 48 8d 45 f0 lea rax,[rbp-0x10] 299 40094b: 89 d6 mov esi,edx 300 40094d: 48 89 c7 mov rdi,rax 301 400950: e8 53 fe ff ff call 4007a8 <main::{lambda(int)#1}::operator()(int) const> 302 } 303 400955: c9 leave 304 400956: c3 ret
The type parameter here is something unusual: main::{lambda(int)#1}
, which clearly isn't valid C++ syntax—you couldn't explicitly
write this type in a program (which is why we rely on auto
to deduce the type for us).
This is the type of the functor class GCC generates on-the-fly when you make a lambda, which is why lambdas and functors look so similar behind the scenes.
Again, this code is identical to that generated for the Functor
instance of template_caller
, except the function is hard-wired
to call our lambda's operator()
.
Template type deduction is a low-overhead approach to passing callable things to functions in C++.
In the case of functors and lambdas, the associated template instance can directly call the appropriate operator()
; plain
functions require one memory read to read the address of the function to call.
Passing each callable thing to template_caller
generates a different template instantiation, since all three callables
have different types. Passing a plain function produces a template instance for a function pointer; this template instance will be used
for every plain function of the same type (taking one int
and returning an int
). Passing a functor or lambda
produces a template instance for that particular functor class or lambda.
Functors and lambdas can't reuse template instantiations for
different lambdas or functor classes because the template instance must hard-code the operator()
to be called, unlike with
function pointers, where the function to be called is a parameter to template_caller
. To achieve such reuse, we need to
introduce another layer of indirection. std::function
implements this indirection with a virtual operator()
function.
std::function
Hold on to your hats; we're about to disassemble some of the C++ standard library!
Since std::function
gives a uniform means to refer to callable things, function_caller
is not templated, and thus
the analysis of how function_caller
works will not depend on whether it's calling a function, a functor, or a lambda.
The only difference between the different calls is what is passed in main()
, so let's start there.
For each call to function_caller
, we should expect three function calls:
std::function
's constructor, to construct the object passed to function_caller
function_caller
std::function
's destructor
215 y = function_caller(function, 5); 216 400839: 48 8d 45 80 lea rax,[rbp-0x80] 217 40083d: be 76 07 40 00 mov esi,0x400776 218 400842: 48 89 c7 mov rdi,rax 219 400845: e8 c8 05 00 00 call 400e12 <std::function<int (int)>::function<int (*)(int), void, void>(int (*)(int))>
Here the constructor is passed the address (rbp-0x80
) of a new object to construct in rdi
and the address of function
,
0x400776
in esi
. The constructor does some preliminary checks, but the fun starts on line 928, where we see the start of a call
to initialize a _My_handler
object, which is specialized to call the callable thing std::function
is boxing (in this case, a
function pointer).
907 0000000000400e12 <std::function<int (int)>::function<int (*)(int), void, void>(int (*)(int))>: 908 function<_Res(_ArgTypes...)>:: 909 400e12: 55 push rbp 910 400e13: 48 89 e5 mov rbp,rsp 911 400e16: 53 push rbx 912 400e17: 48 83 ec 18 sub rsp,0x18 913 400e1b: 48 89 7d e8 mov QWORD PTR [rbp-0x18],rdi 914 400e1f: 48 89 75 e0 mov QWORD PTR [rbp-0x20],rsi ... 928 _My_handler::_M_init_functor(_M_functor, std::move(__f)); 929 400e4b: 48 8d 45 e0 lea rax,[rbp-0x20] 930 400e4f: 48 89 c7 mov rdi,rax #### rdi -> rax 931 400e52: e8 ac ff ff ff call 400e03 <std::remove_reference<int (*&)(int)>::type&& std::move<int (*&)(int)>(int (*&)(int))> 932 400e57: 48 89 c2 mov rdx,rax 933 400e5a: 48 8b 45 e8 mov rax,QWORD PTR [rbp-0x18] 934 400e5e: 48 89 d6 mov rsi,rdx 935 400e61: 48 89 c7 mov rdi,rax 936 400e64: e8 ef 00 00 00 call 400f58 <std::_Function_base::_Base_manager<int (*)(int)>::_M_init_functor(std::_Any_data&, int (*&&)(int))>
First, note that this
and the address of function
are placed on the stack at
rbp-0x18
and rbp-0x20
, respectively.
Lines 929 and 930 put a pointer to the pointer to function
in rdi
.
std::remove_reference
(called on line 931) has the effect of moving rdi
to rax
(it does some
type-level manipulation too, but
that doesn't appear in the assembly).
That pointer (holding the value rbp-0x20
) makes its way to rsi
via lines 932 and 934.
Line 935 puts this
into rdi
; line 936 calls the handler's init function.
That init function exists only to perform some
template metaprogramming which results
in a call to the real init function at 0x4011aa
with the same parameters.
Let's see what happens in there:
1290 00000000004011aa <std::_Function_base::_Base_manager<int (*)(int)>::_M_init_functor(std::_Any_data&, int (*&&)(int), std::integral_constant<bool, true>)>: 1291 _M_init_functor(_Any_data& __functor, _Functor&& __f, true_type) 1292 4011aa: 55 push rbp 1293 4011ab: 48 89 e5 mov rbp,rsp 1294 4011ae: 53 push rbx 1295 4011af: 48 83 ec 18 sub rsp,0x18 1296 4011b3: 48 89 7d e8 mov QWORD PTR [rbp-0x18],rdi 1297 4011b7: 48 89 75 e0 mov QWORD PTR [rbp-0x20],rsi 1298 { new (__functor._M_access()) _Functor(std::move(__f)); } 1299 4011bb: 48 8b 45 e0 mov rax,QWORD PTR [rbp-0x20] 1300 4011bf: 48 89 c7 mov rdi,rax 1301 4011c2: e8 3c fc ff ff call 400e03 <std::remove_reference<int (*&)(int)>::type&& std::move<int (*&)(int)>(int (*&)(int))> #### rdi -> rax 1302 4011c7: 48 8b 18 mov rbx,QWORD PTR [rax] 1303 4011ca: 48 8b 45 e8 mov rax,QWORD PTR [rbp-0x18] 1304 4011ce: 48 89 c7 mov rdi,rax #### rdi -> rax 1305 4011d1: e8 a0 fa ff ff call 400c76 <std::_Any_data::_M_access()> 1306 4011d6: 48 89 c6 mov rsi,rax 1307 4011d9: bf 08 00 00 00 mov edi,0x8 #### rsi -> rax 1308 4011de: e8 81 fa ff ff call 400c64 <operator new(unsigned long, void*)> 1309 4011e3: 48 85 c0 test rax,rax 1310 4011e6: 74 03 je 4011eb <std::_Function_base::_Base_manager<int (*)(int)>::_M_init_functor(std::_Any_data&, int (*&&)(int), std::integral_constant<bool, true>)+0x41> 1311 4011e8: 48 89 18 mov QWORD PTR [rax],rbx 1312 4011eb: 90 nop 1313 4011ec: 48 83 c4 18 add rsp,0x18 1314 4011f0: 5b pop rbx 1315 4011f1: 5d pop rbp 1316 4011f2: c3 ret
function
is dereferenced and placed in rbx
via lines 1297 and 1299–1302.this
is placed in rax
via lines 1296 and 1303–1307.function
is stored in *this
on line 1311.std::function
object is a pointer to the function we want to call.
This finishes up the call to initialize _My_handler
.
The constructor for std::function
does one other important thing: it sets up a couple vtable entries.
937 _M_invoker = &_My_handler::_M_invoke; 938 400e69: 48 8b 45 e8 mov rax,QWORD PTR [rbp-0x18] 939 400e6d: 48 c7 40 18 97 0f 40 mov QWORD PTR [rax+0x18],0x400f97 940 400e74: 00 941 _M_manager = &_My_handler::_M_manager; 942 400e75: 48 8b 45 e8 mov rax,QWORD PTR [rbp-0x18] 943 400e79: 48 c7 40 10 d0 0f 40 mov QWORD PTR [rax+0x10],0x400fd0 944 400e80: 00 945 }
The vtable entries are created by storing function addresses in the object we're constructing.
What's important to note here is that this+0x18
holds the address of _M_invoke
, 0x400f97
.
We'll need to know this when analyzing std::function::operator()
.
function_caller
:
Back in main()
, we see that a pointer to the std::function
object is passed in rdi
, and arg
's value,
5, is passed in esi
:
220 40084a: 48 8d 45 80 lea rax,[rbp-0x80] 221 40084e: be 05 00 00 00 mov esi,0x5 222 400853: 48 89 c7 mov rdi,rax 223 400856: e8 2a ff ff ff call 400785 <function_caller(std::function<int (int)>, int)>
function_caller
just calls std::function::operator()
with the same arguments in the same registers:
138 0000000000400785 <function_caller(std::function<int (int)>, int)>: 144 int function_caller(std::function<int(int)> f, int arg) { 145 400785: 55 push rbp 146 400786: 48 89 e5 mov rbp,rsp 147 400789: 48 83 ec 10 sub rsp,0x10 148 40078d: 48 89 7d f8 mov QWORD PTR [rbp-0x8],rdi 149 400791: 89 75 f4 mov DWORD PTR [rbp-0xc],esi 150 return f(arg); 151 400794: 8b 55 f4 mov edx,DWORD PTR [rbp-0xc] 152 400797: 48 8b 45 f8 mov rax,QWORD PTR [rbp-0x8] 153 40079b: 89 d6 mov esi,edx 154 40079d: 48 89 c7 mov rdi,rax 155 4007a0: e8 a9 05 00 00 call 400d4e <std::function<int (int)>::operator()(int) const> 156 } 157 4007a5: c9 leave 158 4007a6: c3 ret
The indirection fun starts in std::function::operator()
:
796 0000000000400d4e <std::function<int (int)>::operator()(int) const>: 801 400d4e: 55 push rbp 802 400d4f: 48 89 e5 mov rbp,rsp 803 400d52: 53 push rbx 804 400d53: 48 83 ec 18 sub rsp,0x18 805 400d57: 48 89 7d e8 mov QWORD PTR [rbp-0x18],rdi 806 400d5b: 89 75 e4 mov DWORD PTR [rbp-0x1c],esi ... 818 400d73: 48 8b 45 e8 mov rax,QWORD PTR [rbp-0x18] 819 400d77: 48 8b 58 18 mov rbx,QWORD PTR [rax+0x18] 820 400d7b: 48 8d 45 e4 lea rax,[rbp-0x1c] 821 400d7f: 48 89 c7 mov rdi,rax #### rdi -> rax 822 400d82: e8 b8 ff ff ff call 400d3f <int&& std::forward<int>(std::remove_reference<int>::type&)> 823 400d87: 48 89 c2 mov rdx,rax 824 400d8a: 48 8b 45 e8 mov rax,QWORD PTR [rbp-0x18] 825 400d8e: 48 89 d6 mov rsi,rdx 826 400d91: 48 89 c7 mov rdi,rax 827 400d94: ff d3 call rbx 828 } 829 400d96: 48 83 c4 18 add rsp,0x18 830 400d9a: 5b pop rbx 831 400d9b: 5d pop rbp 832 400d9c: c3 ret
_M_invoke
(0x400f97
) is loaded into rbx
by lines 805, 818, and 819.arg
is placed in rsi
via lines 806, 820–823, and 825.this
is stored in rdi
via lines 805, 824, and 826._M_invoke
's vtable entry is called on line 827.It is the responsibility of _M_invoke
to interpret what is stored in the std::function
object and how to call it:
1064 0000000000400f97 <std::_Function_handler<int (int), int (*)(int)>::_M_invoke(std::_Any_data const&, int&&)>: 1065 _M_invoke(const _Any_data& __functor, _ArgTypes&&... __args) 1066 400f97: 55 push rbp 1067 400f98: 48 89 e5 mov rbp,rsp 1068 400f9b: 53 push rbx 1069 400f9c: 48 83 ec 18 sub rsp,0x18 1070 400fa0: 48 89 7d e8 mov QWORD PTR [rbp-0x18],rdi 1071 400fa4: 48 89 75 e0 mov QWORD PTR [rbp-0x20],rsi 1072 return (*_Base::_M_get_pointer(__functor))( 1073 400fa8: 48 8b 45 e8 mov rax,QWORD PTR [rbp-0x18] 1074 400fac: 48 89 c7 mov rdi,rax #### rdi -> rax (via 4 function calls!) 1075 400faf: e8 3f 02 00 00 call 4011f3 <std::_Function_base::_Base_manager<int (*)(int)>::_M_get_pointer(std::_Any_data const&)> 1076 400fb4: 48 8b 18 mov rbx,QWORD PTR [rax] 1077 std::forward<_ArgTypes>(__args)...); 1078 400fb7: 48 8b 45 e0 mov rax,QWORD PTR [rbp-0x20] 1079 400fbb: 48 89 c7 mov rdi,rax #### rdi -> rax 1080 400fbe: e8 7c fd ff ff call 400d3f <int&& std::forward<int>(std::remove_reference<int>::type&)> 1081 return (*_Base::_M_get_pointer(__functor))( 1082 400fc3: 8b 00 mov eax,DWORD PTR [rax] 1083 std::forward<_ArgTypes>(__args)...); 1084 400fc5: 89 c7 mov edi,eax 1085 400fc7: ff d3 call rbx 1086 } 1087 400fc9: 48 83 c4 18 add rsp,0x18 1088 400fcd: 5b pop rbx 1089 400fce: 5d pop rbp 1090 400fcf: c3 ret
function
stored in this
is moved to rbx
by lines 1070 and 1073–1076.arg
is moved to rax
by lines 1071 and 1078–1080.arg
is fetched from the pointer and placed in edi
by lines 1082 and 1084.function
is called on line 1085.When wrapping a function pointer, std::function
is implemented as a virtual class whose operator()
calls a virtual function on a handler class that then calls the wrapped function pointer. Constructing this object requires storing
both the function pointer and a couple vtable entries.
We will study how the behavior of std::function
changes when created from a functor or lambda, rather than a function pointer.
Unsurprisingly, the code generated is functionally equivalent for functors and lambdas.
The code shown is for a Functor
instance.
228 y = function_caller(functor, 5); 229 40086d: 8b 95 70 ff ff ff mov edx,DWORD PTR [rbp-0x90] 230 400873: 48 8d 45 a0 lea rax,[rbp-0x60] 231 400877: 89 d6 mov esi,edx 232 400879: 48 89 c7 mov rdi,rax 233 40087c: e8 33 06 00 00 call 400eb4 <std::function<int (int)>::function<Functor, void, void>(Functor)>
As before, this
is passed in rdi
. Our functor
object, being one int
, is passed in esi
.
973 0000000000400eb4 <std::function<int (int)>::function<Functor, void, void>(Functor)>: 974 function<_Res(_ArgTypes...)>:: 975 400eb4: 55 push rbp 976 400eb5: 48 89 e5 mov rbp,rsp 977 400eb8: 53 push rbx 978 400eb9: 48 83 ec 18 sub rsp,0x18 979 400ebd: 48 89 7d e8 mov QWORD PTR [rbp-0x18],rdi 980 400ec1: 89 75 e0 mov DWORD PTR [rbp-0x20],esi ... 994 _My_handler::_M_init_functor(_M_functor, std::move(__f)); 995 400eec: 48 8d 45 e0 lea rax,[rbp-0x20] 996 400ef0: 48 89 c7 mov rdi,rax #### rdi -> rax 997 400ef3: e8 ad ff ff ff call 400ea5 <std::remove_reference<Functor&>::type&& std::move<Functor&>(Functor&)> 998 400ef8: 48 89 c2 mov rdx,rax 999 400efb: 48 8b 45 e8 mov rax,QWORD PTR [rbp-0x18] 1000 400eff: 48 89 d6 mov rsi,rdx 1001 400f02: 48 89 c7 mov rdi,rax 1002 400f05: e8 7d 01 00 00 call 401087 <std::_Function_base::_Base_manager<Functor>::_M_init_functor(std::_Any_data&, Functor&&)>
functor
is placed in rsi
via lines 980, 995–998, and 1000.this
is placed in rdi
, as usual, by 999 and 1001.1409 00000000004012b6 <std::_Function_base::_Base_manager<Functor>::_M_init_functor(std::_Any_data&, Functor&&, std::integral_constant<bool, true>)>: 1410 _M_init_functor(_Any_data& __functor, _Functor&& __f, true_type) 1411 4012b6: 55 push rbp 1412 4012b7: 48 89 e5 mov rbp,rsp 1413 4012ba: 53 push rbx 1414 4012bb: 48 83 ec 18 sub rsp,0x18 1415 4012bf: 48 89 7d e8 mov QWORD PTR [rbp-0x18],rdi 1416 4012c3: 48 89 75 e0 mov QWORD PTR [rbp-0x20],rsi 1417 { new (__functor._M_access()) _Functor(std::move(__f)); } 1418 4012c7: 48 8b 45 e0 mov rax,QWORD PTR [rbp-0x20] 1419 4012cb: 48 89 c7 mov rdi,rax #### rdi -> rax 1420 4012ce: e8 d2 fb ff ff call 400ea5 <std::remove_reference<Functor&>::type&& std::move<Functor&>(Functor&)> 1421 4012d3: 48 89 c3 mov rbx,rax 1422 4012d6: 48 8b 45 e8 mov rax,QWORD PTR [rbp-0x18] 1423 4012da: 48 89 c7 mov rdi,rax #### rdi -> rax 1424 4012dd: e8 94 f9 ff ff call 400c76 <std::_Any_data::_M_access()> 1425 4012e2: 48 89 c6 mov rsi,rax 1426 4012e5: bf 04 00 00 00 mov edi,0x4 #### rsi -> rax 1427 4012ea: e8 75 f9 ff ff call 400c64 <operator new(unsigned long, void*)> 1428 4012ef: 48 85 c0 test rax,rax 1429 4012f2: 74 04 je 4012f8 <std::_Function_base::_Base_manager<Functor>::_M_init_functor(std::_Any_data&, Functor&&, std::integral_constant<bool, true>)+0x42> 1430 4012f4: 8b 13 mov edx,DWORD PTR [rbx] 1431 4012f6: 89 10 mov DWORD PTR [rax],edx 1432 4012f8: 90 nop 1433 4012f9: 48 83 c4 18 add rsp,0x18 1434 4012fd: 5b pop rbx 1435 4012fe: 5d pop rbp 1436 4012ff: c3 ret
functor
is placed in rbx
, via lines 1416 and 1418–1421.this
is copied to rax
by lines 1415 and 1422–1427.functor
is dereferenced and functor
is copied into the std::function
object
by lines 1430 and 1431.Back in the constructor for std::function
, the vtable is set up:
1003 _M_invoker = &_My_handler::_M_invoke; 1004 400f0a: 48 8b 45 e8 mov rax,QWORD PTR [rbp-0x18] 1005 400f0e: 48 c7 40 18 c6 10 40 mov QWORD PTR [rax+0x18],0x4010c6 1006 400f15: 00 1007 _M_manager = &_My_handler::_M_manager; 1008 400f16: 48 8b 45 e8 mov rax,QWORD PTR [rbp-0x18] 1009 400f1a: 48 c7 40 10 02 11 40 mov QWORD PTR [rax+0x10],0x401102 1010 400f21: 00 1011 }
Note that this time we have a different invoke function, 0x4010c6
.
The call to function_caller
is the same as with the function pointer (as it should be—this is the entire point
of std::function
).
std::function
object is passed in rdi
and arg
is passed in esi
.std::function::operator()
places a copy of arg
on its stack frame, then passes
a pointer to that in rsi
to _M_invoke
(along with this
in rdi
).1195 00000000004010c6 <std::_Function_handler<int (int), Functor>::_M_invoke(std::_Any_data const&, int&&)>: 1196 _M_invoke(const _Any_data& __functor, _ArgTypes&&... __args) 1197 4010c6: 55 push rbp 1198 4010c7: 48 89 e5 mov rbp,rsp 1199 4010ca: 53 push rbx 1200 4010cb: 48 83 ec 18 sub rsp,0x18 1201 4010cf: 48 89 7d e8 mov QWORD PTR [rbp-0x18],rdi 1202 4010d3: 48 89 75 e0 mov QWORD PTR [rbp-0x20],rsi 1203 std::forward<_ArgTypes>(__args)...); 1204 4010d7: 48 8b 45 e0 mov rax,QWORD PTR [rbp-0x20] 1205 4010db: 48 89 c7 mov rdi,rax #### rdi -> rax 1206 4010de: e8 5c fc ff ff call 400d3f <int&& std::forward<int>(std::remove_reference<int>::type&)> 1207 return (*_Base::_M_get_pointer(__functor))( 1208 4010e3: 8b 18 mov ebx,DWORD PTR [rax] 1209 4010e5: 48 8b 45 e8 mov rax,QWORD PTR [rbp-0x18] 1210 4010e9: 48 89 c7 mov rdi,rax #### rdi -> rax (via 4 function calls!) 1211 4010ec: e8 0f 02 00 00 call 401300 <std::_Function_base::_Base_manager<Functor>::_M_get_pointer(std::_Any_data const&)> 1212 std::forward<_ArgTypes>(__args)...); 1213 4010f1: 89 de mov esi,ebx 1214 4010f3: 48 89 c7 mov rdi,rax 1215 4010f6: e8 15 fc ff ff call 400d10 <Functor::operator()(int)> 1216 } 1217 4010fb: 48 83 c4 18 add rsp,0x18 1218 4010ff: 5b pop rbx 1219 401100: 5d pop rbp 1220 401101: c3 ret
This invocation function differs from the invoker for a function pointer in two ways.
First, we need to manage both the functor
object as well as the argument being passed to Functor::operator()
.
Second, we haven't saved the address of Functor::operator()
anywhere, so we will need to directly call it.
this
pointer for the functor object is placed in rdi
via lines 1201, 1209–1211, and 1214.
Notice that since the functor object is the first thing in the std::function
object, the same this
pointer
is used for both.arg
pointer is dereferenced and its value placed in esi
via lines 1202, 1204–1208, and 1213.Functor::operator()
is called directly on line 1215.
The std::function
object for a functor object or lambda consists of a copy of the functor/lambda object and a vtable,
which contains a pointer to an invocation function. The invocation function, unlike that for a function pointer, has hard-coded the
address of the functor or lambda's operator()
.
Thus, each functor class and lambda requires a corresponding handler class and invocation function when boxed with std::function
.
std::function
provides a uniform 'box' for passing around and calling any sort of callable object.
This allows developers to write code without using templates or restricting the sort of callable things that can be passed to functions.
std::function
achieves this uniformity by performing dynamic function dispatch using virtual functions.
When a std::function
object is created, it contains either a function pointer or a functor/lambda object, as well as
a vtable with a pointer to an invocation function that knows how to call the callable thing inside the std::function
object.
As with the template type deduction approach, the invocation function is specialized to either a function pointer of a given type
or to a specific functor class or lambda. While every function pointer of the same type can share one invocation function, the invocation
function for functor classes and lambdas must have hard-coded the address of the appropriate operator()
to call and thus cannot
be reused between functor classes or lambdas.
The process for dynamically dispatching the function call is as follows:
std::function::operator()
.operator()
.Template metaprogramming functions are scattered throughout the code for std::function
.
It would be reasonable to be concerned about the additional overhead from these (effectively) no-op functions, but enabling -O1
or -Og
eliminate all this code.
Both approaches to passing callable things to functions require one template instance per functor class or lambda object.
In the case of directly using template type deduction, the function taking a callable as a parameter is specialized; in the case
of boxing the callable in std::function
, the handler class is specialized. As such, neither approach clearly offers
smaller code size.
However, std::function
definitely incurs one vtable lookup and function dispatch that cannot be eliminated, even
by an optimizer. As a result of this, std::function
will always incur some function call overhead, and it will require
memory overhead (at least two pointers, or 16 bytes on a 64-bit system) for storing the vtable.
By comparison, directly using template type deduction and passing a lambda or functor class produces assembly that is much easier
to optimize. It is common in this case for the lambda/functor's operator()
to be inlined into the template instance.
This is possible since the template instance directly calls it, as opposed to calling via a pointer as would be the case for a virtual function
dispatch or a function pointer. Furthermore, in the case that the function being passed a callable is small, the template instantiation
may itself be inlined into the call site, resulting in a net reduction of two function calls.
Therefore, we recommend that std::function
be used only when dynamic dispatch is strictly necessary.
Using template type deduction for functions that take a callable thing allows the caller of that function to choose whether they
want static or dynamic dispatch, since they can always just pass a std::function
to the templated function.
For an example of how this would work as a member function of a templated class, consider the following simplified Vector implementation:
1 #include<vector> 2 #include<iostream> 3 4 template<class Item> 5 class Vector { 6 public: 7 Vector(const unsigned int size) : v(size) {} 8 9 unsigned int size() { return v.size(); } 10 11 Item& operator[](const unsigned int idx) { 12 return v[idx]; 13 } 14 15 template<class Func> 16 Vector map(Func f) { 17 Vector ret(size()); 18 19 for(unsigned int i = 0; i < size(); i++) { 20 ret[i] = f(v[i]); 21 } 22 23 return ret; 24 } 25 26 private: 27 std::vector<Item> v; 28 }; 29 30 int main() { 31 Vector<int> nums(5); 32 for(int i = 0; i < 5; i++) { 33 nums[i] = i; 34 } 35 36 Vector<int> doubled = nums.map([] (int n) { return n * 2; }); 37 38 for(int i = 0; i < 5; i++) { 39 std::cout << nums[i] << " * 2 = " << doubled[i] << std::endl; 40 } 41 42 return 0; 43 }
(Note: this map()
function requires that the function passed to it map Item
to Item
.
For a more flexible but less clear example, see this Vector implementation.)
Unfortunately, C++ cannot perform template type deduction on return values, so this approach cannot be used to return lambdas
from functions. However, C++14 introduces the ability to deduce the return type of a function using the auto
keyword.
This can be used to deduce both functor classes and on-the-fly generated lambda classes. For example:
1 auto get_func() { 2 return [] (int x) { return x + 3; }; 3 }
This is not such a large disadvantage, though, because quite often returning a callable thing from a function is a situation in which you need to perform dynamic dispatch anyway. Consider this slightly more complicated example:
1 std::function<int(int)> halve_or_triple(int n) { 2 if(n%2 == 0) { 3 return [] (int x) { return x/2; }; 4 } 5 else { 6 return [] (int x) { return 3*x + 1; }; 7 } 8 }
Here there is no way to specialize the return type of this function to one lambda or the other at compile time—the exact type of the object
returned from this function can only be determined at the time it is called. (Even C++14's auto
cannot deduce the return type for you here.)
Therefore, we must dynamically dispatch this function call.
C++ offers two techniques for writing functions that take a callable thing as a parameter: template type deduction and std::function
.
The template type deduction approach performs static function dispatch; that is, the function to be called is determined at compile time.
This approach allows for aggressive optimization and low overhead in terms of both memory consumption and CPU instructions for performing the call.
However, in situations where the function to be called cannot be determined at compile time, an additional level of indirection is needed.
std::function
offers dynamic function dispatch via a virtual function call. This comes at a cost, however: at least two virtual table
entries must be stored for each std::function
instance, and one vtable lookup is performed each time an instance's operator()
is called. Dynamic dispatch is often necessary when returning callable things from functions. It is also essential should one wish to store
callable things to be called at a later time, such as when registering callbacks. The trade-off between static and dynamic dispatch is analogous
to the trade-off between static polymorphism (using CRTP) and
dynamic polymorphism (using virtual functions).
As such, we conclude that the proper design for a function taking a callable thing should, if at all possible, allow the caller to determine whether
they desire static or dynamic dispatch. Since the template type deduction approach can create a template instantiation for std::function
,
taking this approach permits the caller to pass either a functor or lambda directly or to pass a std::function
instance should they
require dynamic dispatch.
std::function
's indirection is eliminated via optimization?operator()
. How would one write a recursive lambda where the recursion call is
dynamically dispatched? Is it possible to write a recursive lambda with a statically dispatched recursion call?