Featured post

Why should I learn Go?

Image
What is unique about GO language? Here are some of the advantages of GO programming language:           Code runs fast           Garbage collection           Simpler objects           Efficient concurrency Code runs faster: Before understanding why GO runs faster, let us know the process of software translation. Basically, we have three broad categories of languages:             Machine level language ·        Machine level language is a low-level language where instructions are directly executed on the CPU. Machine level instructions are small steps which are straight forward and simple (Ex: ADD, SUBTRACT, MULTIPLY ) Assembly language ·        Assembly language is similar to machine level language but a bit more specific for humans to understand. For example, 1000...

Phases of a compiler

  


INTRODUCTION TO VARIOUS PHASES OF COMPILER

As a programmer, you must understand how your program written in some programming language gets compiled and provides you the required output.

So, in this post let us see the procedure involved in the conversion of the given source program into machine dependent code and finally providing us the required output.

We look at the different phases of a compiler and how the conversion takes place.
So let’s get started to understand the magic of compilers!!

 Before we start learning about various phases of compiler first we need to know what a compiler is and what is its role in producing the output?

 What is a compiler?

The main role of complier is to convert the high level language into low level language.

  You might get a doubt:

    “Why don’t we directly write a program in low level language?”


 The reason is that it is very difficult to write a program in low level language which consists of 0's and 1's. This is known as machine understandable language. The high level language is mainly written in human understandable language.

There are many software modules present in order to convert the high language into low level language. They are :


  1 .Pre-Processor:

The input to the pre-processor phase is the high level language which converts it into pure high level language. (ie, pre-processor code which is obtained by removing the # tags which are comments)

For example in c language we write:
                       #<include>
What the Pre Processor does is it removes the #<include> and replaces it with the original file. This process is called "File Inclusion".

 The other one is #<define> .The #define creates a macro, which is the association of an identifier or parameterized identifier with a token string. After the macro is defined, the compiler can substitute the token string for each occurrence of the identifier in the source file.

  For example if we are dealing with banking interest, if any interest value has been changed we need to change the entire value of interest. Simply define its value which reflects in the entire program.

  2. Compiler:

This is the main phase of the compilation .It takes pure high level as input and convert them into the assembly level language which is mainly a intermediate language .we shall learn about assembly language later.

  3. Assembly language:
This takes the assembly level language as input and coverts it into machine code  or relocatable machine code.
                 
Machine code:
This mainly contains 0's and 1's.

1.    Relocatable machine code: It means we can load this machine code anywhere in computer and run. In this way the computer does not assume that you are running a program from some particular location. We can run machine code from any point and all the addresses will be in such way that they will helping in program moment.

  4. Loader and Linker:
This will convert the machine code into the executable or absolute machine code and later into the memory.

So far, we have learnt how the high level language is converted into low level language.

Now we will look through the various phases of compiler.

PHASES OF COMPILER




The above diagram shows you the various phases of compiler. Let us look at the phases of compiler by taking one simple example which enables you to learn and understand in very simple language.

  let us consider a very simple expression. We will see how this expression is converted and executed.  

      w=x+y*z
                                               
   Lexical Analyser phase:

In this phase x=a+b*c is given to the Lexical analyser. It converts the source code into a stream of tokens. ie, id= id + id * id
Here id implies “identifier”.
The tokens are :
·       Id
·       =
·       Id
·       +
·       Id
·       *
·       id

The responsibility of this phase is to remove the white phases.
You might have a question like how the lexical analyser identifies the tokens.
This is done using pattern matching which is in the form l(l+d)^* .
“It means a letter followed by any number of letters and digits.”

Syntax analyser phase:

The stream of tokens are handled by syntax analyser phase .It also considers the grammar.
This grammar is nothing but a “context free grammar”. This is mainly used to represent the set of rules which mainly contains productions.
the grammar for above expression is as follow

                       s->id=E;
                   
                       E->E+T/T

                       T->T*F/F

                       F->id

It also generates a parse tree
     
We should check that the out which we have received should be equal to the yield of the parse tree. ie, id=id+id*id

  Semantics:
The parse tree is the syntax to the semantic analyser phase .Its role is to check whether the given tree is meaning full or not.
For example, in the expression in id=id+id*id the left hand side (id) should always be a variable. It should not be a constant or an array.
The entire type checking is done in this semantic analyser phase.

  Intermediate code generation: The input is semantically verified parse tree.it mainly involves the “three address code”.

                             t1=y*z

                             t2=x+t1

                             w=t2

The above is the representation of three address code which can be even 3 or 2 address statements

Code optimizer:
The main function of this is to reduce the number of lines in the program. Input is three address code

                           
                             t1=y*z

                             w=x+t1

In the above representation, the number of lines are reduced to 2 by removing t2 instruction
The phases from lexical analyser till the Intermediate code generator is same for every compiler. The only thing which changes is code optimizer and target code generator.
It mainly depends on the platform.

Target code generator:
The main aim of target code generator is to write a code which an assembler can understand.

                            mul R1,R2
                            add R0,R2
                            mov R2,w
          where

    R1->y
               R2->z
                RO->x

The front end of the compiler is called 'LANCE' which includes until intermediate code generator phase.
Lex is the tool used by the lexical analyser.
YACC (yet another compiler complier) is the tool used by syntax analyser.

So far we have seen the brief description of compilers. Hope you liked it.

If you have any questions please write a comment below!

Please subscribe so that you won’t miss any programming stuff from “EFFICIENT PROGRAMMER” .
                           


Comments

Post a Comment

Thanks for your comments!

Popular posts from this blog

Introduction to Big Data and Hadoop

LocationManager vs GoogleApiClient

Why should I learn Go?