A program written in a "programming language" is called a source program. Source code refers to an uncompiled text file written according to certain programming language specifications. It refers to a series of human-readable computer language instructions, usually written in a high-level language. The ultimate goal of a computer source program is to translate human-readable text into binary instructions that a computer can execute. This process is called compilation and is completed through a compiler.
The operating environment of this tutorial: Windows 7 system, Dell G3 computer.
A program written in a "programming language" is called a source program.
What is a source program
Source program, also known as source code, refers to uncompiled, designed according to a certain program A text file written in a language specification refers to a series of human-readable computer language instructions, usually written in a high-level language.
In modern programming languages, source programs can appear in the form of books or tapes or other carriers, but the most commonly used format is text files. The purpose of this typical format is to compile a computer program.
The ultimate goal of a computer source program is to translate human-readable text into binary instructions that a computer can execute. This process is called compilation and is completed through a compiler.
The source program file type refers to the special encoding method used when storing the source program, which is easy to read and identify. Text files are the most commonly used file type, but many high-level languages and assembly languages have their own file types. It is generally customary to save them in high-level language or assembly language file types, mainly for the convenience of later compilation by the compiler.
The emergence of high-level programming languages (also called high-level languages) makes computer programming languages no longer rely excessively on a specific machine or environment. This is because high-level languages are compiled into different machine languages on different platforms instead of being directly executed by the machine. One of the main goals of FORTRAN, one of the earliest programming languages, is to achieve platform independence.
Function
The main functions of the source code have the following two functions:
Generate target code, that is, the computer can recognize it code.
Describe the software, that is, explain the writing of the software. Many beginners, and even a few experienced programmers, ignore the writing of software descriptions; because this part will not be directly displayed in the generated program, nor will it be involved in compilation. But it shows that it has huge benefits for software learning, sharing, maintenance and software reuse. Therefore, writing software descriptions is considered a good habit in the industry to create excellent programs, and some companies also make it mandatory to write it.
It should be pointed out that for compiled languages, such as C/C/Java, modification of the source code cannot change the generated target code. If the target code needs to be modified accordingly, it must be recompiled. However, there are currently many popular scripting languages, such as Perl/Python, which do not require recompilation. After modifying the code, you can directly execute it and see the results of the modification.
Code combination
Source code, as a special part of the software, may be included in one or more files. A program does not have to be written in the same format as source code. For example, if a program is supported by a C language library, it can be written in C language; while another part can be written in assembly language in order to achieve higher operating efficiency. As far as the current situation is concerned, there is very little software that needs to be written directly in assembly language, because many times the optimization program generated by the compiler is already very efficient, and more often it is compiled using C/C. Language to write the parts of the core that require speed, and use dynamic languages such as Perl/Python/Lua to do core extensions, such as interfaces, management configurations, etc. This will neither lose efficiency nor increase the flexibility of the program.
More complex software generally requires the participation of dozens or even hundreds of source codes. In order to reduce this complexity, a system that can describe the relationship between each source code and how to compile it correctly must be introduced. In this context, version control systems (VCS) were born and became one of the necessary tools for developers to revise code.
There is another combination: porting software written for one platform to another platform, such as porting software under Windows to Linux or MacOS. The professional term is called software transplantation. Generally, software that can run on multiple platforms is called cross-platform software.
Processing high-level language source programs into target programs
The system software that can process high-level language source programs into target programs is " Compiler".
Compiler, compiling program, also called a compiler, refers to a translation program that translates a source program written in a high-level programming language into an equivalent target program in machine language format. Compilers are translation programs implemented using a generative implementation approach. It takes a source program written in a high-level programming language as input, and a target program expressed in assembly language or machine language as output. The compiled target program usually also goes through a running stage in order to run with the support of the running program, process the initial data, and calculate the required calculation results.
The compiler must analyze the source program and then synthesize it into the target program. First, check the correctness of the source program and decompose it into several basic components; secondly, establish corresponding equivalent target program parts based on these basic components. In order to complete these tasks, the compiler must create some tables during the analysis phase and transform the source program into an intermediate language form so that it can be easily referenced and processed during analysis and synthesis.
Characteristics of the compiler:
The compiler must analyze the source program and then synthesize it into the target program. First, check the correctness of the source program and decompose it into several basic components; secondly, establish corresponding equivalent target program parts based on these basic components. In order to complete these tasks, the compiler must create some tables during the analysis phase and transform the source program into an intermediate language form so that it can be easily referenced and processed during analysis and synthesis.
The main data structures used in data structure analysis and synthesis include symbol tables, constant tables and intermediate language programs. The symbol table consists of the identifiers used in the source program together with their attributes, which include types (such as variables, arrays, structures, functions, procedures, etc.), types (such as integers, real types, strings, complex types, labels) etc.), and other information required by the target program. The constant table consists of the constants used in the source program, including the machine representation of the constants, and the target program addresses assigned to them. An intermediate language program is an intermediate form of program introduced before translating the source program into the target program. The choice of its representation depends on how the compiler will use and process it later. Commonly used intermediate language forms include Polish representation, triples, quadruples, and indirect triples.
Analysis of part of the source program is achieved through three steps: lexical analysis, syntax analysis and semantic analysis. Lexical analysis is completed by a lexical analysis program (also called a scanner), whose task is to identify words (i.e. identifiers, constants, reserved words, and various operators, punctuation marks, etc.), create symbol tables and constant tables, and convert The source program is converted into an internal form that is easy to analyze and process by the compiler. The syntax analyzer is the core part of the compiler. Its main task is to check whether the source program is grammatical according to the grammatical rules of the language. If it is not grammatical, a syntax error message will be output; if it is grammatical, the grammatical structure of the source program will be decomposed and an internal program in the form of intermediate language will be constructed. The purpose of grammatical analysis is to understand how words form sentences and how statements form programs. The semantic analysis program further checks the semantic correctness of legal program structures. Its purpose is to ensure the correct use of identifiers and constants, collect and save necessary information into symbol tables or intermediate language programs, and perform corresponding semantic processing.
The working process of a compiler
A compiler is also called a compilation system. It is a language processing program that translates process-oriented source programs written in high-level languages into target programs. . The compiler's process of translating a source program into a target program is divided into five stages: lexical analysis; syntax analysis; intermediate code generation; code optimization; and target code generation. It mainly performs lexical analysis and syntax analysis, also known as source program analysis. During the analysis process, grammatical errors are found and prompt information is given.
(1) Lexical analysis
The task of lexical analysis is to process words composed of characters, scan the source program character by character from left to right, and generate word symbols one by one. , transform the source program that is a string into an intermediate program that is a string of word symbols. A program that performs lexical analysis is called a lexer or scanner.
The word symbols in the source program are analyzed by the scanner and generally generate binary formulas: word category; the value of the word itself. Word categories are usually encoded with integers. If a category contains only one word symbol, then for this word symbol, the category encoding completely represents its own value. If a category contains many word symbols, then for each of its word symbols, in addition to the category code, its own value should also be given.
Lexical analyzers are generally constructed in two ways: manual construction and automatic generation. Manual construction can work using state diagrams, automatic generation can be implemented using deterministic finite automata.
(2) Syntax analysis
The syntax analyzer of the compiler takes word symbols as input, analyzes whether the word symbol string forms a grammatical unit that conforms to the grammatical rules, such as expression, assignment, loop, etc., and finally checks whether it forms a program that meets the requirements, according to the language The grammatical rule analysis used checks whether each statement has the correct logical structure, and the program is the final grammatical unit. The grammatical rules of a compiler can be characterized by a context-free grammar.
There are two methods of syntax analysis: top-down analysis and bottom-up analysis. Top-down means starting from the starting symbol of the grammar and deducing downward to derive the sentence. The bottom-up analysis method uses the shift-in reduction method. The basic idea is: use a registered symbol first-in-last pop-out to move the input symbols into the stack one by one. When the top of the stack forms a production of a certain When selecting a candidate expression, the top part of the stack is reduced to the left-neighboring symbol of the production.
(3) Intermediate code generation
Intermediate code is an internal representation of the source program, or intermediate language. The function of the intermediate code is to make the structure of the compiled program logically simpler and clearer, especially to make the optimization of the target code easier to implement. The intermediate code is the intermediate language program, and the complexity of the intermediate language is between the source program language and the machine language. There are many forms of intermediate language, common ones are reverse Polish notation, tetragrams, ternary forms and trees.
(4) Code optimization
Code optimization refers to performing multiple equivalent transformations on the program so that more effective target code can be generated starting from the transformed program. The so-called equivalence means that the running results of the program are not changed. The so-called effective mainly refers to the short running time of the target code and the small storage space occupied. This transformation is called optimization.
There are two types of optimization: one is to optimize the intermediate code after syntax analysis, which does not depend on the specific computer; the other is to perform when generating the target code, which is to a large extent Depends on specific computer. For the former type of optimization, it can be divided into three different levels: local optimization, loop optimization and global optimization according to the scope of the program involved.
(5) Target code generation
Target code generation is the last stage of compilation. The target code generator converts the syntactically analyzed or optimized intermediate code into target code. There are three forms of target code:
① Machine language code that can be executed immediately, all addresses are relocated;
② Machine language module to be assembled, which is loaded by the connection when it needs to be executed The program connects them with certain running programs and converts them into executable machine language codes;
③ Assembly language codes must be compiled by an assembler to become executable machine language codes.
The target code generation stage should consider three issues that directly affect the speed of the target code: first, how to generate shorter target code; second, how to make full use of the registers in the computer and reduce the time for the target code to access the storage unit. times; the third is how to make full use of the characteristics of the computer instruction system to improve the quality of the target code.
For more related knowledge, please visit the FAQ column!
The above is the detailed content of What is the program written in called source program?. For more information, please follow other related articles on the PHP Chinese website!