The compilation process can be divided into 5 stages: 1. Lexical analysis stage. This stage will scan and decompose the strings that make up the source program and identify each word. 2. Syntax analysis stage, which is used to analyze the grammatical structure of the sentence. 3. Semantic analysis and intermediate code generation stage. 4. Code optimization stage. 5. Target code generation program stage.
The operating environment of this tutorial: Windows 10 system, Dell G3 computer.
The compiler's process of translating a source program into a target program is divided into five stages: lexical analysis; syntax analysis; intermediate code generation; code optimization; and target code generation.
The compilation program is generally divided into five stages: lexical analysis, syntax analysis, semantic analysis and intermediate code generation, code optimization, and target code generation:
The following is a detailed explanation of the five stages of the compilation process
For the work of the compiler, the entire process from inputting the source program to outputting the target program is very complicated. But in terms of its process, it has many similarities with people's direct translation of natural language. When we translate one text into another text, such as translating a paragraph of English into Chinese, we usually need to go through the following steps:
(1) First, identify each word in the sentence ;
(2) Analyze the grammatical structure of the sentence;
(3) Preliminary translation according to the meaning of the sentence;
(4) Modify the translation;
(5)Write the final translation.
Similarly, we can divide the work process of the compiler into five stages: lexical analysis, syntax analysis, semantic analysis and intermediate code generation, optimization and target code generation.
The first stage: lexical analysis
The task of lexical analysis is: input the source program, scan and decompose the strings that make up the source program, and identify each Words (also called word symbols or simply symbols), such as basic words (begin, end, if, for, while), identifiers, constants, operators and delimiters (punctuation marks, left and right brackets).
Word symbols are the basic components of language and the basic elements for people to understand and write programs. Identifying and understanding these elements is undoubtedly the basis of translation. Just like translating from English to Chinese, if you don't understand the English words, it's impossible to translate correctly. What is followed in the lexical analysis stage is the lexical rules of the language (or word formation rules). Effective tools for describing lexical rules are formal forms and efficient automata.
Second stage: grammatical analysis
The task of grammatical analysis is: on the basis of lexical analysis, according to the grammatical rules of the language, decompose the word symbol string into each Grammar-like units (grammatical categories), such as "phrases", "sentences", "program segments" and "programs", etc. Through syntactic analysis, it is determined whether the entire input string constitutes a grammatically correct "program". Grammatical analysis follows the grammatical rules of the language. Grammar rules are usually described using context-free grammars. Lexical analysis is a linear analysis, while syntactic analysis is a hierarchical analysis. For example: Z= Therefore, the task of syntax analysis is to identify X 0.618 * Y as an arithmetic expression, and at the same time, ten times the entire symbol string mentioned above belongs to the category of assignment statement.
The third stage: semantic analysis and intermediate code generation
The task of this stage is to analyze the meaning of various grammatical categories identified by syntax analysis , and perform preliminary translation (generating intermediate code). This stage usually includes two aspects of work. First, perform semantic arrangements for each grammatical category, for example, whether the variable is defined, whether the type is correct, etc. If the semantics are correct, the other side of the work is performed, that is, the interpretation of the intermediate code.
This stage follows the semantic rules of the language. Semantic rules are usually described using attribute grammars. "Translation" only begins to come into play here. The so-called "intermediate code" is a notation system with clear meaning and easy to process, which is usually independent of specific hardware. This notation system is either close to the instruction form of modern computers to some extent, or it can be relatively easily transformed into machine instructions of modern computers.
For example, many compilers use a "quaternary formula" that is very similar to the "three-address instruction" as an intermediate code. Its meaning is: perform a certain operation on the "left and right operands" (specified by "operator"), and retain the value obtained by the operation as the "result". When tetragrams are used as intermediate codes, the task of generating intermediate codes is to translate various categories into tetragram sequences according to the grammatical rules of the language.
Generally speaking, intermediate code is a notation system that is independent of specific hardware. Commonly used middle aunts, in addition to quaternary formulas, also include ternary formulas, indirect ternary formulas, reverse Polish notation, tree representation, etc.
The fourth stage: code optimization
The task of optimization is to process and transform the intermediate code generated in the previous stage, in order to produce more efficient (saving code) in the final stage. time and space) object code. The main aspects of optimization include: extraction of PR subexpressions, loop optimization, deletion of useless code, etc. Sometimes, in order to facilitate "parallel operations", the code can also be parallelized. The principle that optimization follows is the equivalent transformation rule of the program.
The fifth stage: target code generation program
The task of this stage is to transform the intermediate code (or after optimization) into a low-level code on a specific machine Language code. This stage implements the final translation, and its work depends on the hardware system structure and machine instruction meaning. The work at this stage is very complex, including the design of the use of hardware system functional components, the selection of machine instructions, the allocation of storage space for various data type variables, and the scheduling of registers and backup registers, etc.
The object code can be in the form of absolute instruction code or relocatable instruction code or assembly instruction code. If the object code is an absolute instruction code, such object code can be executed immediately. If the target code is assembly instruction code, it needs to be compiled by the assembler before it can be run. It must be pointed out that the object code generated by most practical compilers today is a relocatable instruction code. Before running, this kind of target code must use a connection assembly program to connect each target module (including the library function provided by the system) together, determine the location of the program variables (or constants) in the main memory, and load the specified memory into the memory. The starting address makes it an absolute instruction code program that can be run.
For more programming related knowledge, please visit: Programming Video! !
The above is the detailed content of The compilation process can be divided into several stages. For more information, please follow other related articles on the PHP Chinese website!