The most obvious overall task of a compiler is to read a program in one language - the 'source' program in the 'source' language - and to translate it to produce an equivalent program in another language - the `object' program in the `object' language. The object language is usually the machine language of a computer, or something close to it, and the source program is usually in a 'high-level' anguage such as FORTRAN, PASCAL, ALGOL 68, SIMULA 67 or BCPL, because translation from high-level language to machine language is the practical problem which compilers exist to solve. Compilers can be written to translate from any kind of programming language into any other, however, with varying degrees of efficiency of the resulting code.
Another part of the compiler's overall task, just as important but too often neglected, is that the compiler must check that the source program makes some kind of sense and, when it doesn't seem to make sense, must produce a detailed description of the problem (an error report) so that the programmer can correct the program.
Before describing how the overall tasks are split up into sub-tasks it's worth discussing the ways in which the sub-sections of a compiler can be combined to make a complete compiler. The overall activity of compilation can be divided into a sequence of phases, during each of which one of the sub-tasks is carried out. As emphasised in the introduction to this section, all compilers perform the same sub-tasks in the same sequence, and therefore all compilers consist of the same sequence of phases. Conceptually each phase transforms the program fragment-by-fragment, taking as input a fragment of some representation of the source program and producing as output a fragment of some other, transformed, representation. Because the transformation is fragment-by-fragment it is possible to choose how the phases are linked. If each separate fragment goes through all the phases before compilation of the next fragment starts we have a single-pass compiler, because all of the source program is compiled in a single pass over the source text. Conversely, if the entire program goes through each one of the phases before any of it is presented to the next phase we have a multi-pass compiler, because each phase performs a separate pass over the program. Clearly if there are N phases, you can organise the compiler into anything from 1 to N passes.
For example, the Borland Turbo Pascal compiler is single-pass compiler, however, the compilation is split into many phases. Internally, the source code is parsed only once but some phases are repeated to optimize the generated code.
Thus every compiler must be multi-phase, but it may or may not be multi-pass. Logically the phases must be joined together in the same order no matter how many passes are employed, so that it might not seem to matter whether most popular books concentrate on multi-pass or single-pass organisation. In practice, however, the multi-pass organisation is simpler to describe and to understand because the interface between the phases is so much more straightforward. Indeed for some languages - ALGOL 60 is a good example - it is difficult to construct an effective single-pass compiler at all and therefore the multi-pass organisation is also more generally useful.
It is often believed that multi-pass compilation is inherently less efficient than single-pass. This is simply untrue: since in any compiler each fragment must be processed by the same sequence of phases, a single-pass compiler must perform the same amount of work as a multi-pass compiler. Suppose, however, that each pass of a multi-pass compiler were to write its output into a disc file (or any backing store file) from which the next pass had to read it in again. Such a multi-pass compiler would indeed be less efficient than a single-pass compiler which didn't use backing store but the inefficiency would be caused by the processing overhead of input and output between passes.
A multi-pass compiler which stores the output of each pass in the main computer memory will certainly be no slower than a single-pass compiler. On a modern computer the only disadvantage of multi-pass compilation is that it may use quite a lot of space to store the output of the various passes. Nowadays computer memories (for everything but the smallest micro-processors!) are large enough for this minor drawback to be overwhelmed by the enormous advantages of clarity of design, and consequent ease of design, of a multi-pass compiler. For most of the rest of this book I describe a compiler organised in two passes, which is an effective compromise between economy of space and clarity of organisation. In the discussion of compilation algorithms, however, it is usually simplest to imagine that each of the phases in a compiler always performs a separate pass over the program.
No comments:
Post a Comment