Table of Contents
What are the four stages of the compilation process?
Normally compiling a C program is a multi-stage process and utilizes different 'tools'.
In this post, I'll walk through each one of the four stages of compiling stages using the following C program:
/**
* "Hello, World!": A classic.
* /
#include <stdio.h>
int main(void)
{
puts("Hello, \
World!"); // A comment
return 0;
}
Preprocessing
The first stage of compilation is called preprocessing. In this stage, lines starting with a # character are interrupted by the preprocessor as preprocessor commands. Before interrupting the commands, the preprocessor does some initial processing. This includes joining continued lines(lines ending with a \)and stripping comments.
To print the result of the preprocessing stage, pass the -E option to cc:
cc -E hello_world.c
Given the "Hello, World!" example above, the preprocessor will produce the contents of the stdio.h
header file joined with the contents of the hello_world.c
file, stripped free from its leading comment.
Unlike what the author did, actually I use the command
gcc -E hello_world.c > hello_world.i
here, which means to save the result of the preprocessing stage to a file named hello_world.i
with the processed file hello_world.i, we can see the result as below:
... // lines omitted for brevity # 501 "d:\\mingw\\include\\stdio.h" 3 __attribute__((__cdecl__)) __attribute__((__nothrow__)) __attribute__((__format__(__mingw_printf__,3,4))) int snprintf (char *, size_t, const char *, ...); ... // lines omitted for brevity # 3 "hello_world.c" int main(void) { puts("Hello, World!"); return 0; }
Comparing to the content from the original file hello_world.c, in the processed file hello_world.i, we can see that the continued lines are joined into one line, and the comments are removed.
Compilation
The second stage of compilation is confusingly enough called compilation. In this stage, the preprocessed code is translated to assembly instructions specific to the target processor architecture. These form an intermediate human-readable language.
The existence of this step allows for C code to contain inline assembly instructions and for different assemblers to be used.
Some compilers also support the use of an integrated assembler, in which the compilation stage generates
machine code directly, avoiding the overhead of generating the intermediate assembly instructions and invoking the assembler.
To save the result of the compilation stage, pass the -c
option to cc
:
cc -S hello_world.c
This will create a file named hello_world.s
, containing the generated assembly instructions. On Mac OS 10.10.4, where cc
is an alias for clang
, the following output is generated:
.section __TEXT,__text,regular,pure_instructions
.macosx_version_min 10, 10
.globl _main
.align 4, 0x90
_main: ## @main
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp0:
.cfi_def_cfa_offset 16
Ltmp1:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp2:
.cfi_def_cfa_register %rbp
subq $16, %rsp
leaq L_.str(%rip), %rdi
movl $0, -4(%rbp)
callq _puts
xorl %ecx, %ecx
movl %eax, -8(%rbp) ## 4-byte Spill
movl %ecx, %eax
addq $16, %rsp
popq %rbp
retq
.cfi_endproc
.section __TEXT,__cstring,cstring_literals
L_.str: ## @.str
.asciz "Hello, World!"
.subsections_via_symbols
I use
gcc
instead ofclang
on my pc. So I use command with the option-S
to generate the assembly code like this
gcc -S hello_world.c
After that, I can see the assembly code in the filehello_world.s
. Lets take a look the content of the file as below.
.file "hello_world.c"
.def ___main; .scl 2; .type 32; .endef
.section .rdata,"dr"
LC0:
.ascii "Hello, World!\0"
.text
.globl _main
.def _main; .scl 2; .type 32; .endef
_main:
LFB10:
.cfi_startproc
pushl %ebp
.cfi_def_cfa_offset 8
.cfi_offset 5, -8
movl %esp, %ebp
.cfi_def_cfa_register 5
andl $-16, %esp
subl $16, %esp
call ___main
movl $LC0, (%esp)
call _puts
movl $0, %eax
leave
.cfi_restore 5
.cfi_def_cfa 4, 4
ret
.cfi_endproc
LFE10:
.ident "GCC: (MinGW.org GCC-6.3.0-1) 6.3.0"
.def _puts; .scl 2; .type 32; .endef
Assembly
During the assembly stage, an assembler is used translate the assembly instructions to machine code, or or object code.. The output consists of actual instructiosn to be run by the target processor.
To save the result the of the assembly stage, pass the -c
option to cc
:
cc -c hello_world.c
Running the above command will create a file named hello_world.o
, containing the object code of the program. The contents of this file are in a binary format and can be inspected using a hexdump or od by running either one of the following commands:
hexdump hello_world.o
od -c hello_world.o
Linking
The object code generated in the assembly stage is composed of machine instructions that the processor understands but some pieces of the program are out of order or missing. To produce an executable program, the existing pieces have to be rearranged and the missing ones filled in. This process is called linking.
Let's assume your project contains two(hello.c and world.c)source files. So, when you complie the project, the assembler will give you hello.o and world.o object files. In the end, we need only one binary file that has to be loaded into the target processor or controller. So, the linker will arrange those hello.o and world.o, gives a single binary file.
That's why you will get an error from the linker when you are calling the function which is not defined anywhere. Linker tries to find that function in ohter source files and throws an error if it couldn't find that.
The linker will arrange the pieces of object code so that functions in some pieces can successfully call functions in other pieces. It will also add pieces containing the instructions for library functions used by the program. In the case of the “Hello, World!” program, the linker will add the object code for the puts
function.
The result of this stage is the final executable program. When run without options, cc
will name this file a.out
. To name the file something else, pass the -o
option to cc
:
cc -o hello_world hello_world.c
For your quick reference:
标签:__,Layout,cfi,Compilation,program,file,world,hello,stage From: https://www.cnblogs.com/archerqvq/p/18160008