INTRODUCTION
TO VARIOUS PHASES OF COMPILER
As a programmer, you
must understand how your program written in some programming language gets
compiled and provides you the required output.
So, in this post let us see the procedure involved in the
conversion of the given source program into machine dependent code and finally
providing us the required output.
We look at the different phases of a compiler and how the
conversion takes place.
So let’s get started to understand the magic of compilers!!
Before we
start learning about various phases of compiler first we need to know what a compiler
is and what is its role in producing the output?
What is a compiler?
The main role of complier is to convert the high level language
into low level language.
You might
get a doubt:
“Why
don’t we directly write a program in low level language?”
The reason
is that it is very difficult to write a program in low level language which
consists of 0's and 1's. This is known as machine understandable language. The
high level language is mainly written in human understandable language.
There are many
software modules present in order to convert the high language into low level
language. They are :
1 .Pre-Processor:
The input to the pre-processor phase is the high level language
which converts it into pure high level language. (ie, pre-processor code which is
obtained by removing the # tags which are comments)
For example in c language we write:
#<include>
What the Pre Processor does is it removes the #<include> and replaces it with the original file. This
process is called "File
Inclusion".
The other one
is #<define> .The #define creates a macro, which is the
association of an identifier or parameterized identifier with a token string.
After the macro is defined, the compiler can substitute the token string for
each occurrence of the identifier in the source file.
For example
if we are dealing with banking interest, if any interest value has been changed
we need to change the entire value of interest. Simply define its value which
reflects in the entire program.
2. Compiler:
This is the main phase of the compilation .It takes pure high
level as input and convert them into the assembly level language which is
mainly a intermediate language .we shall learn about assembly language later.
3. Assembly language:
This takes the assembly level language as input and coverts it
into machine code or relocatable machine
code.
Machine code:
This mainly contains 0's and 1's.
1. Relocatable
machine code: It means we
can load this machine code anywhere in computer and run. In this way the computer
does not assume that you are running a program from some particular location.
We can run machine code from any point and all the addresses will be in such way
that they will helping in program moment.
4. Loader and Linker:
This will convert the machine code into the executable or absolute
machine code and later into the memory.
So far, we have learnt how the high level language
is converted into low level language.
Now we will look through the various phases of compiler.
PHASES OF COMPILER
⦁ The above
diagram shows you the various phases of compiler. Let us look at the phases of
compiler by taking one simple example which enables you to learn and understand
in very simple language.
let us
consider a very simple expression. We will see how this expression is converted
and executed.
w=x+y*z
Lexical Analyser phase:
In this phase x=a+b*c
is given to the Lexical analyser. It converts the source code into a stream of
tokens. ie, id= id + id * id
Here id implies “identifier”.
The tokens are :
· Id
·
=
·
Id
·
+
·
Id
·
*
·
id
⦁The
responsibility of this phase is to remove the white phases.
⦁You might have a question like how the lexical
analyser identifies the tokens.
⦁This is done using pattern matching which
is in the form l(l+d)^* .
“It means a letter followed
by any number of letters and digits.”
Syntax analyser phase:
The stream of tokens are handled by syntax analyser phase .It also
considers the grammar.
⦁This grammar is nothing but a “context
free grammar”. This is mainly used to represent the set of rules which mainly
contains productions.
⦁ the grammar for above expression is as
follow
s->id=E;
E->E+T/T
T->T*F/F
F->id
⦁It also generates a parse tree
⦁We should check
that the out which we have received should be equal to the yield of the parse
tree. ie, id=id+id*id
Semantics:
The parse tree is the syntax to the semantic analyser phase .Its
role is to check whether the given tree is meaning full or not.
⦁For example, in
the expression in id=id+id*id the left hand side (id) should always be a variable.
It should not be a constant or an array.
⦁The entire type checking is done in this
semantic analyser phase.
Intermediate code generation: The input is semantically verified parse
tree.it mainly involves the “three address
code”.
t1=y*z
t2=x+t1
w=t2
⦁The above is the representation of three
address code which can be even 3 or 2 address statements
Code optimizer:
The main function of this is to reduce the number of lines in the
program. Input is three address code
t1=y*z
w=x+t1
⦁In the above representation, the number of
lines are reduced to 2 by removing t2 instruction
⦁The phases from lexical analyser till the
Intermediate code generator is same for every compiler. The only thing which
changes is code optimizer and target code generator.
⦁It mainly depends on the platform.
⦁ Target code generator:
The main aim of target code generator is to write a code which an
assembler can understand.
mul R1,R2
add R0,R2
mov R2,w
where
R1->y
R2->z
RO->x
⦁The front end of the compiler is called
'LANCE' which includes until intermediate code generator phase.
⦁Lex is the tool used by the lexical analyser.
⦁YACC (yet another compiler
complier) is the tool
used by syntax analyser.
So far we have
seen the brief description of compilers. Hope you liked it.
If you have any questions please
write a comment below!
Please subscribe so that you won’t miss any programming stuff from
“EFFICIENT PROGRAMMER” .
copied from Ravindra Babu Ravulla Lectures
ReplyDeletenice piece
ReplyDelete