The history of calling conventions, part 5: amd64

The last architecture I’m going to cover in this series is the
AMD64 architecture (also known as x86-64).


The AMD64 takes the traditional x86 and expands the registers
to 64 bits, naming them rax, rbx, etc.
It also adds eight more general purpose registers, named simply
R8 through R15.


  • The first four parameters to a function are passed in rcx, edx, r8 and r9.
    Any further parameters are pushed on the stack.
    Furthermore, space for the register parameters is reserved on the stack,
    in case the called function wants to spill them; this is important
    if the function is variadic.


  • Parameters that are smaller than 64 bits are not zero-extended;
    the upper bits are garbage, so remember to zero them explicitly if you
    need to.
    Parameters that are larger than 64 bits are passed by address.


  • The return value is placed in rax. If the return value is larger
    than 64 bits, then a secret first parameter is passed which contains
    the address where the return value should be stored.


  • All registers must be preserved across the call, except for
    rax, ecx, edx, r8, r9, r10, and r11, which are scratch.

  • The callee does not clean the stack. It is the caller’s
    job to clean the stack.


  • The stack must be kept 16-byte aligned.
    Since the “call” instruction pushes an 8-byte return address,
    this means that every non-leaf function is going to adjust the
    stack by a value of the form 16n+8 in order to restore 16-byte
    alignment.


Here’s a sample:


void SomeFunction(int a, int b, int c, int d, int e);
void CallThatFunction()
{
SomeFunction(1, 2, 3, 4, 5);
SomeFunction(6, 7, 8, 9, 10);
}


On entry to CallThatFunction, the stack looks like this:

xxxxxxx0.. rest of stack ..
xxxxxxx8return address<- RSP


Due to the presence of the return address, the stack is misaligned.
CallThatFunction sets up its stack frame, which might go like this:


sub rsp, 0x28


Notice that the local stack frame size is 16n+8, so that the result
is a realigned stack.

xxxxxxx0.. rest of stack ..
xxxxxxx8return address
xxxxxxx0 (arg5)
xxxxxxx8 (arg4 spill)
xxxxxxx0 (arg3 spill)
xxxxxxx8 (arg2 spill)
xxxxxxx0 (arg1 spill) <- RSP


Now we can set up for the first call:


mov dword ptr [rsp+0x20], 5 ; output parameter 5
mov r9d, 4 ; output parameter 4
mov r8d, 3 ; output parameter 3
mov edx, 2 ; output parameter 2
mov ecx, 1 ; output parameter 1
call SomeFunction ; Go Speed Racer!


When SomeFunction returns, the stack is not cleaned, so it
still looks like it did above. To issue the second call, then,
we just shove the new values into the space we already reserved:


mov dword ptr [rsp+0x20], 10 ; output parameter 5
mov r9d, 9 ; output parameter 4
mov r8d, 8 ; output parameter 3
mov edx, 7 ; output parameter 2
mov ecx, 6 ; output parameter 1
call SomeFunction ; Go Speed Racer!


CallThatFunction is now finished and can clean its stack and return.

add rsp, 0x28
ret

Notice that you see very few “push” instructions in amd64 code,
since the paradigm is for the caller to reserve parameter space
and keep re-using it.
[The Old New Thing]

Comments are closed.