The lexical structure of a programming language is a basic set of rules that describe how you write the language. As the basis of the syntax, it stipulates what variable names look like, how to write comments, and how to distinguish between statements. This section uses a very short space to introduce the lexical structure of JavaScript.
1. Character set
Javascript programs are written using the Unicode character set, which is a superset of ASCII and Latin-1 and supports almost all languages in the region. ECMAscript3 requires that the implementation of javascript must support Unicode2,1 and subsequent versions, and ECMAscript5 requires support of Unicode3 and subsequent versions
i. Case sensitive
Javascript is a case-sensitive language, which means that keywords, variables, function names and all expression characters must be in consistent upper and lower case. For example, the keyword while must be written as while, not While or WHILE.
But it should be noted that html is not case-sensitive (although xhtml is). Since it is closely related to client-side javascript, it is easy to be confused. For example, in the processing event set in HTML, the onclick attribute can be written as onClick, but in JavaScript, onclick is written in lowercase.
ii Space, newline, and format controls
Javascript will ignore spaces between tokens in the program. In most cases, JavaScript will also ignore newlines. Since spaces and line breaks can be used freely in the code, neat and consistent indentation can be used to form a unified coding style and improve the readability of the code.
Javascript recognizes the space character (u0020) in addition to it. JavaScript also represents the following characters that mark spaces: horizontal tab (u0009), vertical tab (u000B), form feed (u000C), nonbreaking whitespace (u00A0), byte order mark (uFEFF), and in All characters of the Zs category in Unicode. JavaScript recognizes the following characters as terminators: line feed (u000A), carriage return (u000D), line separator (u2028), and paragraph separator (u2029). The carriage return character and the line feed character together are parsed into a single line terminator.
Unicode format control characters (Cf class), such as "right to left writing mark" (u200F) and "left to right writing mark" (u200E), control the visual display of text. This is crucial for the correct display of some non-English text. These characters can be used in JavaScript comments, string literals and regular expression literals, but not in identifiers (for example, variable names). , with the exception of zero-width connector (u200D) and zero-width non-connector (uFEFF), which are hand characters that can appear in identifiers but cannot be used as identifiers. As mentioned above, the byte order mark format control character (uFEFF) is treated as a space
iii.Unicode escape sequence
In some computer hardware and software, the full set of Unicode characters cannot be displayed or input. In order to support programmers who use older technologies, JavaScript defines a special sequence that uses 6 ASCII characters to represent any 16-bit Unicode internal code. These Unicode escape sequences are prefixed with u followed by the hexadecimal mouse (represented using numbers and upper and lower case letters A-F). This Unicode escape writing method can be used in JavaScript string literals, regular expressions, and identifiers (except keywords). For example, the Unicode escape writing method of character é is u00E9, and the following two Javascript strings are exactly the same.
"café" === "cafu00e9" => true
Unicode escape writing can appear in comments, but since JavaScript ignores comments, they are only treated as ascii characters in the context and will not be followed by the corresponding Unicode characters
iiii standardization
Unicode allows multiple methods to encode the same character. For example, the character é can use the Unicode character u00E9, or the ordinary ascii character e followed by an intonation mark u0301. In the text editor, the results displayed by these two encodings are exactly the same, but their binary encoding representations are different. The same is true in computers. The Unicode standard defines a preferred code format for index characters and provides a standardized processing method to convert text into a standard format suitable for comparison. It will not standardize other representations, strings, or regular expressions. .
2. Notes
Javascript supports two comment methods. The text after the end of the line "//" will be ignored by JavaScript as a comment.
In addition, the text between /* and */ will also be treated as comments. This kind of comment can be written across lines, but there cannot be nested comments.
//Single line comment
/*
*
*
*
*/
3. Direct measurement
The so-called literals are data values used directly in the program. The literal quantities are listed below
{x:1,y:2} //Object
[1,2,3,4,5] //Array
4. Identifiers and reserved words
An identifier is a name. In JavaScript, identifiers are used to name variables and functions, or to mark jump locations in certain loop statements in JavaScript code. JavaScript identifiers must start with letters. Start with an underscore or dollar sign. Subsequent characters can be letters. number. Underscore or dollar sign (numbers are not allowed to appear as the first letter, JavaScript can easily distinguish identifiers and numbers), the following are legal identifiers
javascript reserved words
class const enum export
export extends import super
Also, these keywords are legal in normal javascript, but are reserved words in strict mode
implements let private public yield interface package
protected static
In the same strict mode, strict restrictions are placed on the following identifiers, but variable names, parameter names, and function names cannot be used.
arguments eval
The specific implementation of JavaScript may define unique global variables and functions. Each specific JavaScript running environment (client) server, etc. has its own global attribute list, which needs to be kept in mind. (window object to understand the global variables and function list defined in client javascript)
5. Optional semicolon
Like many programming languages, JavaScript uses semicolons (;) to separate statements. This is very important to enhance the readability and neatness of the code. Without a separator, the end of one statement becomes the beginning of the next statement, and vice versa.
In JavaScript, each statement is on its own line, and the semicolon between statements can usually be omitted (the semicolon before the "}" curly brace at the end of the program can also be omitted). Many JavaScript programmers (including the code examples in this book) use a semicolon to clearly mark the end of a statement, even when the semicolon is not exactly needed. Another style is to use a semicolon whenever the semicolon can be omitted. Omit it and use semicolons only when you have to. Regardless of your programming style, there are several details about JavaScript to be aware of.
In the following code, the first semicolon can be omitted
a=3;
b=4;
But if it is written in the following format, the first semicolon cannot be omitted.
a=3;b=4;
It should be noted that JavaScript does not fill in semicolons at all line breaks: JavaScript will fill in semicolons only when the code cannot be parsed normally without a semicolon. In other words (similar to the two places in the code below Exception), if the current statement and subsequent non-space characters cannot be parsed as a whole, JavaScript will fill in the semicolon at the end of the current statement, look at the following code
var a
a
=
3
console.log(a)
javascript parses it as
var a;a=3;console.log(a);
JavaScript adds a semicolon to the first line. Without the semicolon, JavaScript cannot parse var a a in the code. The second a can be regarded as a statement "a;" alone, but JavaScript does not fill the semicolon at the end of the second line. Because it can be parsed into "a=3;" together with the third line of content.
Some statement separation rules will lead to some unexpected situations. This code break is divided into two lines, which looks like two independent statements.
var y = x f
(a b).toString()
The parentheses in the second line and the f in the first line form a function call. JavaScript will regard this code as
var y = x f(a b).toString();
Obviously this is not the intention of the code. In order to parse the above code into two different statements, the display semicolon of the behavior must be manually filled in
Generally speaking, if a statement starts with ( [ / -, then it is very likely to be parsed together with the previous statement. Statements starting with / - are not very common, but statements starting with ( [ are very common . At least it is very common in some JavaScript encoding styles. Some programmers like to conservatively add a semicolon in front of the statement. In this way, even if the previous statement is revised and the semicolon is deleted by mistake, the current statement is still the same. It will be parsed correctly;
If the current statement and the next line of statements cannot be combined and parsed. JavaScript then pads the semicolon after the first line, which is the general rule, except for two columns. The first exception involves the returnm, birak, and continue statements, if these three keywords are followed by a newline. JavaScript will fill in semicolons at line breaks. For example
For example
return
true;
And javascript is parsed into
return;ture;
The original meaning of the code is
return true;
That is to say, there cannot be line breaks between the subsequent expressions of return, break, and continue. If line breaks are added, the program will only report an error under extremely special circumstances. And the debugging of the program is very inconvenient.
The second example involves the -- operator, these expression symbols can represent prefixes and suffixes for identifier expressions. If it is used as a suffix expression, then it is used as a suffix expression. It and the expression should be seen on one line. Otherwise the end of the line will be padded with semicolons.
The above code is parsed into