In order to use regular expressions more efficiently, you must first understand how it works. The following are the basic steps of regular expression processing.
When you create a regular expression object (using a regular literal or the RegExp constructor), the browser validates your expression and then converts it into a native code program that performs the matching work. . If you assign the regular object to a variable, you can avoid repeating this step.
When the regular class enters the use state, you must first determine the target The starting search position of the string. It is the starting character of the string, or is specified by the lastIndex attribute of the regex, but when it returns here from step four (due to a failed match attempt) , this position is at the next character position after the starting position of the last match.
The way browser manufacturers optimize the regular expression engine is to skip some unnecessary steps by deciding in advance. Avoid a lot of meaningless work. For example, if the regular expression starts with ^, IE and Chrome will usually judge whether the starting position of the string can match, and if the match fails, then you can avoid foolishly searching for subsequent positions. Another An example is to match a string whose third letter is x. A smart approach is to find x first, and then move the starting position back by two characters
Once the regular expression knows the starting position, it checks the text and the regular expression pattern one by one. When a specific character fails to match, the regular expression tries to backtrack to the position of the previous attempt to match. , and then try other possible paths
If an exact match is found at the current position of the string, then the regular expression declares that the match is successful. If the regular expression If all possible paths of the expression are not matched, the regular expression engine will fall back to the second step and try again from the next character. When each character of the string (and the position after the last string) goes through this process, if there is no successful match, then the regular expression will declare a complete match failure
When the regular expression matches the target string, it tests the expression one by one from left to right components to see if a match can be found. When encountering quantifiers and branches, you need to decide what to do next. If you encounter a quantifier (such as *,+?
or {2, }
), the regular expression needs to decide when to try to match more characters; if it encounters a branch (from the |
operator) then it must choose one of the options to try to match.
Whenever the regular expression makes a similar decision, if necessary, other choices will be recorded for use when returning. If the current option matches successfully, the regular expression continues to scan the expression, and if other parts also match successfully, then The matching ends. But if the current option cannot find a matching value, or the subsequent partial matching fails, then the regular expression will backtrack to the last decision point, and then select one of the remaining options. This process will continue until it is found match, or if all permutations and combinations of quantifiers and branching options in the regular expression fail, then it will give up the match, move to the next character in the string, and repeat the process.
The following example comes from the "Repeat and Backtracking" section in "High-Performance JavaScript", which can help you understand backtracking well
var str = "<p>Para 1.</p>" + "<img src='1.jpg' alt="The principle of regular expressions in js" >" + "<p>para 2.</p>" + "<p>p.</p>"; /<p>.*<\/p>/i.test(str);//method 1 /<p>.*?<\/p>/i.test(str);//method 2
See the picture below
The above is the detailed content of The principle of regular expressions in js. For more information, please follow other related articles on the PHP Chinese website!