C to ASM Example #
Example:
// A C program is a collection of functions.
// Here's a minimal program with one function
int
main(int argc, char* argv[])
{
printf("Hello C program\n");
return 0;
}
# Direct C => binary
$ gcc -o hello hello.c
$ ./hello
# C => asm
$ gcc -S -o hello.s hello.c
# take a look at hello.s
# asm => binary
$ gcc -o hello hello.s
$ ./hello
Interesting stuff in hello.s:
-
The string is there, but no newline.
-
The main function exists
- Starts at label “main”
- Ends at “ret”.
- Declared “.globl”
-
In the main function another function is called - not printf, but puts.
-
The optimizer got to us.
Let’s tell it to be less clever:
# C => asm
$ gcc -fno-builtin -S -o hello.s hello.c
# take a look at hello.s
- Now the string has a newline.
- And the function called is “printf”.
How about with two functions:
// add1.c
long
add1(long x)
{
return x + 1;
}
int
main(int _ac, char* _av[])
// initial _ marks args as not used
{
long x = add1(5);
printf("%ld\n", x);
return 0;
}
# C => asm
$ gcc -S -o add1.s add1.c
# take a look at hello.s
- Two functions: add1, main
- each starts at label, ends at “ret”
- In main, the value 5 is moved to “%rdi”
- That must be where the function’s first argument goes.
- No, that’s “%edi”
- I said “%rdi”, wait a second…
- Then add1 is called
- In add1, the value from %rdi goes to some places.
- Eventually, “addq $1, …” happens to it.
- Back in main, %rax is moved to %rsi, and printf is called.
This almost makes sense, but it’s a bit of a mess. Let’s figure it out.
AMD64: ISA and ASM #
This is a 64-bit extension to a 32-bit extension to a 16-bit ISA that probably was based on the instructions for an 8 bit graphics controller.
AMD64 being 64 bit means:
- Registers hold 64 bits (how many bytes?).
- Memory addresses are 64 bits (kind of).
Translating add1.c to assembly #
# add2.s
.global main
.text
# long add2(long x)
# - the argument comes in in %rdi
# - we return the result by putting it in %rax
add2:
enter $0, $0
# long y = x;
mov %rdi, %rax
# y = y + 2;
add $2, %rax
# return y;
leave
ret
main:
enter $0, $0
# long x = 5;
mov $5, %rdi
# y = add1(x)
call add2
# result in %rax
# printf("%ld\n", y)
# - first arg goes in %rdi
# - second arg goes in %rsi
# - for a variable arg function, we need to zero %al
# - %al is the bottom 8 bits of %ax/%eax/%rax
mov $long_fmt, %rdi
mov %rax, %rsi
mov $0, %al
call printf
leave
ret
.data
long_fmt: .string "%ld\n"
To compile this simple hand-written assembly, we use:
gcc -no-pie -o add2 add2.s