The compilation process of a c source file

Compiling a C source file , into an executable program , involves multiple steps . They are as follow :

Preparation of the source files

The first step in compiling a C source file , is the preparation of the source file for preprocessing .

The first step in the preparation step , is that the physical source file , characters , are mapped to the source character set , so multibyte encoding , or other encodings , are mapped to the source character set .

Next , trigraphs are replaced by the characters which they represent . Trigraphs are formed of two interrogation marks , followed by a character , and they are used as replacement for certain characters . For example ??( can be used as a replacement for [ .

Finally , any backslash followed by a new line , is deleted . A backslash followed by a new line , can be used as a way to write a preprocessor directive , such as #define , on multiple lines .

Preprocessing

The source file , is now formed of sequence of characters , and from whitespace . Some of these sequence of characters , are considered to be preprocessor tokens , others are comments , thirds are not related to preprocessing .

What happens next , is that each comment , is replaced by a single white space .

After that , the preprocessor tokens are interpreted . Directives such as #ifdef are executed , macros such as #define x 1 are expanded . And finally , the #include directives are performed , causing referenced headers , or source files , to be first prepared for preprocessing as in the first step, and later on preprocessed as in the second step.

Once preprocessing is done , preprocessing artifact are deleted .

The preprocessing step , can be performed alone , by issuing the command :

  1. $ gcc -E source.c > name_of_preprocessed_file.i
  2. # If using the gcc compiler .

  3. $ cc -E source.c > name_of_preprocessed_file.i
  4. # If using the cc compiler .

  5. $ cpp -E source.c > name_of_preprocessed_file.i
  6. # If using the c preprocessor .

As an example , this is a C source file :

  1. /* This is a comment */
  2. #define x 0
  3. int y = 1,/* Comments are replaced by a single space*/y;


  4. int z = x

And this is the output , of preprocessing this file :

  1. $ gcc -E source.c
  2. int y = 1, y;


  3. int z = 0

$ gcc -E source.c , preprocess the source.c file , and output its content . Comments are replaced by one space , and preprocessor directives are executed. No C syntax checking is performed .

Getting ready for the execution environment

The third step , is to get ready for the execution environment . Character constants and string literals , are translated from the source character set , into the execution character set , including any escape sequences such as \n.

Adjacent string literals, such as "a" "b" are concatenated into one .

The resulting file from this step , is called a translation unit .

Translating into assembly

The resulting file from the first three steps , called a translation unit , is formed of tokens , and whitespace .

The tokens are syntactically and semantically analyzed , with regards to the C standard . The high level C language , is translated into a low level assembly language .

Each cpu architecture , can have its own assembly language , for example the x64 assembly or arm assembly .

As such , when compiling , a target architecture environment can be specified .

Compiling to an architecture , different from the one on which the compiler is running , is called cross compiling .

The translation into assembly step , can be performed , by issuing the command :

  1. $ gcc -S source.c -o name_of_preprocessed_file.s
  2. # If using the gcc compiler .

  3. $ cc -S source.c -o name_of_preprocessed_file.s
  4. # If using the cc compiler .

As an example , the following source file :

  1. int main(void){
  2.         int x =0;
  3. }

is converted to assembly :

  1. $ cc -S source.c
  2. # Translate source.c into source.s

  3. $ cat source.s
  4. # output the content of source.s

  5.         .section        __TEXT,__text,regular,pure_instructions
  6.         .macosx_version_min 10, 12
  7.         .globl  _main
  8.         .p2align        4, 0x90
  9. _main:                                  ## @main
  10.         .cfi_startproc
  11. ## BB#0:
  12.         pushq   %rbp
  13. Lcfi0:
  14.         .cfi_def_cfa_offset 16
  15. Lcfi1:
  16.         .cfi_offset %rbp, -16
  17.         movq    %rsp, %rbp
  18. Lcfi2:
  19.         .cfi_def_cfa_register %rbp
  20.         xorl    %eax, %eax
  21.         movl    $0, -4(%rbp)
  22.         popq    %rbp
  23.         retq
  24.         .cfi_endproc


  25. .subsections_via_symbols

Assembling

In this step, the generated assembly language , is mapped to machine language . Machine language is only formed of 0 and 1 , as such the source file is now translated to 0 and 1 .

The file resulting from this step , is known as object code . Object code , is not yet executable .

The assembling step , can be performed by issuing the following commands :

  1. $ as -c source.s -o source.o
  2. # If using as , assemble an
  3. # assembly file into an
  4. # object file .

  5. $ gcc -c source.c -o source.o
  6. # If using gcc  , translate
  7. # a source.c file into
  8. # object code .

  9. $ cc -c source.c -o source.o
  10. # If using cc , translate a
  11. # source.c file into
  12. # object code .

Linking

In this step , an executable file , is created from object code files. Multiple object code files are combined , parts of static libraries are merged , and external references are resolved . Each operating system , has its own executable object code format .

Linking can be performed by using the ld command , or by providing options for gcc , or cc . For example , the following source file :

  1. /*source.c file */
  2. #include<math.h>
  3. int main(void){
  4.   double number = sqrt(2.9);
  5. }

can be converted to object code using :

  1. $ gcc -c source.c

The object code , can be statically linked against the C math library , and made into an executable file by issuing the command :

  1. $ gcc source.o -lm -o executable_file_name

Final notes

A compiler can perform all these steps , at once . Like for example issuing gcc source.c or cc source.c , the source file is translated into an executable file . Multiple source files , can be passed to gcc , or cc .