A brief discussion of Sizzle's 'compilation principle”

Sizzle is a DOM selector engine written by jQuery author John Resig. Its speed is known as the first in the industry. As an independent new selector engine, it appeared after jQuery version 1.3 and was adopted as an open source project by John Resig. Sizzle is an independent part and does not depend on any library. If you don't want to use jQuery, you can just use Sizzle, or it can be used in other frameworks such as Mool, Dojo, YUI, etc.

A few days ago I was preparing a sharing PPT about jQuery, and I asked my colleagues if there was anything else they wanted to know about jQuery besides how to use it. Someone mentioned that they wanted to know how its selector is implemented, and some People mentioned how the query speed of jQuery compares with other frameworks. Regarding speed, you can download test examples from the official website of sizzle. The speed is indeed very advantageous. But why it has such efficient running speed has to do with the implementation principle I want to discuss here.

Before you understand Sizzle, you must first understand what its selector is. Here is a simple example. Students who are familiar with jQuery must also be familiar with this selector format:

Copy code The code is as follows:

tag #id .class , a:first

It basically filters in depth from left to right to find matching DOM elements. This statement is not too complicated. It is not difficult to implement this query statement ourselves. However, query statements only have basic rules and no fixed number and order of selectors. How can we adapt to this arbitrary arrangement and combination when we write our own code? Sizzle can achieve normal analysis and execution of various situations.

The source code of Sizzle is indeed complicated and difficult to understand its ideas. Let’s put aside the outer layers of packaging and look directly at the three methods that I personally think are the core of the entire implementation:

The first core method. There is a tokenize function in line 1052 of the source code:

Copy code The code is as follows:

function tokenize(selector, parseOnly) { }

The second parameter parseOnly is false, which means that only the token serialization operation is performed and no result is returned. In this case, the serialization result will be cached for later use. Selector is the query statement.

After being processed by this function, for example, if selector="#idtag.class, a:first" is passed in, you can get a result similar to the following format:

[
[
{matches:" id ",type:"ID"},
{matches:" tag ",type:"TAG"},
{matches:" class ",type:"CLASS"},
...
],
[
    {matches:" a",type:"TAG"},
    ...
],
[…],
…
]

Copy after login

Seeing the name of the tokenize function and its function, I can easily think of the word "compilation principle". It's a bit like lexical analysis here, but this lexical analysis is simpler than the lexical analysis done when the program is compiled.

The tokenize method will do "word segmentation" based on the regular expression of commas, spaces and relational selectors in the selector, and obtain a two-dimensional array (please allow me to use this inaccurate name), in which the first-dimensional array They are separated by commas and are called groups in the source code.

Let’s look at the source code again. There is a definition of Expr = Sizzle.selectors = {} starting from line 405. There is a definition of filter at line 567. Here we can find the basic filtering types: "ID", "TAG", "CLASS", "ATTR", "CHILD", "PSEUDO", these are the types finally classified by tokenize.

After the "word segmentation" is completed, still look at the Expr= Sizzle.selectors = {} defined in line 405. All the selectors we are familiar with can be found here, and each selector corresponds to a method definition. At this point, you should be thinking about whether Sizzle actually performs "word segmentation" on the selector, breaks it up, and then finds the corresponding methods from Expr to perform specific query or filtering operations?

The answer is basically yes. But Sizzle has a more specific and clever approach. Let’s look at the second method that I think is very core:

There is a matcherFromTokens function in line 1293 of the source code:

Copy code The code is as follows:

function matcherFromTokens(tokens) { }

The parameters passed in are obtained from the tokenize method. Matcher can be understood as "matching program". Literally speaking, the function of this function is to generate a matching program through tokens. In fact it is. Due to space limitations, this article will only share some of the implementation principles of Sizzle that I understand, and will not post the source code. When I have time later, I may compile a more detailed source code analysis article.

The matcherFromTokens method confirms the previous assumption. It acts as a connection and link between the selector "word segmentation" and the matching method defined in Expr. It can be said that various permutations and combinations of selectors are adaptable. The clever thing about Sizzle is that it does not directly match the obtained "word segmentation" results with the methods in Expr one by one and execute them one by one. Instead, it first combines a large matching method according to the rules and executes it in the last step. But how to execute it after the combination depends on the third key method:

There is a superMatcher method at line 1350 of the source code:

Copy code The code is as follows:

superMatcher = function( seed, context, xml, results, expandContext ) { }

This method is not a directly defined method, but is returned through the matcherFromGroupMatchers(elementMatchers, setMatchers) method on line 1345, but it plays an important role in the final execution.

The superMatcher method will determine a starting query range based on the parameters seed, expandContext and context. It may be query and filter directly from the seed, or it may be within the context or the parent node range of the context. If it does not start from seed, then it will first execute the code Expr.find["TAG"]( "*", expandContext && context.parentNode || context ) and wait for an elems collection (array). Then do a traversal of elems, and use the pre-generated matcher method to match the elements one by one. If the result is true, the elements are directly piled into the returned result set.

Okay, seeing that the original results of the matcher method here are all bool values, let’s go back to line 405 and take a look at the methods included in the filter in Expr. They all return bool values. Including more pseudo-class methods corresponding to PSEUDO (pseudo-class) are the same. It seems to be a bit subversive of my original idea. It turns out that it is not checking down layer by layer, but it is a bit like going back and upward to do matching and filtering. In Expr, only find and preFilter return sets.

Although I still have some doubts about why it uses one-by-one matching and filtering methods to obtain the result set, I think the most basic "compilation principle" of Sizzle should have been clearly explained.

But we can’t leave any doubts, let’s continue. In fact, this article itself seems to be written backwards. Students who are interested in looking at the source code will not see these three key methods at the beginning. In fact, Sizzle also does a series of other tasks before entering these three methods.

The real entrance to Sizzle can be said to be at line 220 of the source code:

Copy code The code is as follows:

function Sizzle( selector, context, results, seed ){ }

The first paragraph of this method is relatively easy to understand. If the selector can be matched to a single ID selector (#id), then the element can be found directly using the context.getElementById(m) method based on the id. If the selector can be matched to a single TAG selector, first use the context.getElementsByTagName(selector) method to find the relevant elements. If the current browser only uses native getElementsByClassName, and the selector matched is a single CLASS selector, the context.getElementsByClassName(m) method will also be used to find the relevant elements. These three methods are all native methods supported by browsers, and their execution efficiency is definitely the highest.

If the most basic methods are not available, you will enter the select method. Line 1480 of the source code has its definition:

Copy code The code is as follows:

function select( selector, context, results, seed, xml ) { }

In the select method, the "word segmentation" operation we mentioned earlier will first be performed on the selector. However, after this operation, we did not directly start assembling the matching method, but first performed some find operations. The find operation here can correspond to the find in Expr. It performs a query operation and returns a result set.

It can be understood that select uses the selector obtained by "word segmentation" to first find out the result set that can be searched using the find method according to its type. When doing the find operation, the result set is narrowed down from left to right in the order of the selectors. If after a traversal, all selectors in the selector can perform the find operation, the result will be returned directly. Otherwise, it will enter the "compilation" and filtering process described earlier.

At this point, you can also come along and basically understand the workflow of Sizzle. The questions left above are actually no longer questions at this point, because when performing reverse matching filtering, its search range is already the smallest set that has been filtered layer by layer. The reverse matching filtering method is actually an efficient choice for the selectors it corresponds to, such as pseudo-classes.

Let’s briefly summarize why Sizzle is so efficient.

First of all, In terms of processing flow, it always uses the most efficient native method for processing first. What has been introduced before is only Sizzle's own selector implementation method. When Sizzle is actually executed, it will first determine whether the current browser supports the querySelectorAll native method (source code line 1545). If supported, this method will be used first. The method natively supported by the browser is definitely more efficient than the method written by Sizzle's own js. Using it first can also ensure Sizzle's higher work efficiency. (You can check more information online about querySelectorAll). When the querySelectorAll method is not supported, Sizzle also gives priority to determining whether it can directly use getElementById, getElementsByTag, getElementsByClassName and other methods to solve the problem.

Secondly, in relatively complex situations, Sizzle always chooses to first use native methods to query and select as much as possible to narrow the selection range, and then use the "compilation principle" introduced earlier to narrow the selection range. The elements are matched one by one and filtered. The workflow into the "compilation" step is a bit complicated, and the efficiency will definitely be slightly lower than the previous methods, but Sizzle is trying to use these methods as little as possible, and at the same time, it is also trying to make the result sets processed by these methods as small and simple as possible. , in order to obtain higher efficiency.

Again , even after entering this "compilation" process, Sizzle also implemented a caching mechanism that we temporarily ignored and did not introduce in order to explain the process first. Line 1535 of the source code is what we call the "compilation" entry, which means it calls the third core method superMatcher. Trace in and see line 1466. The compile method caches the matching function generated based on the selector. Not only that, but look at the tokenize method on line 1052. It actually caches the word segmentation results based on the selector. In other words, after we execute the Sizzle (selector) method once, and call the Sizzle (selector) method directly next time, the most performance-consuming "compilation" process inside it will not consume too much performance, and the previous cache will be directly fetched. method will do. I'm thinking that one of the biggest benefits of the so-called "compilation" may be that it facilitates caching. The so-called "compile" here may also be understood as generating preprocessed functions and storing them for later use.

At this point, I hope that I can basically answer the questions of students who were previously concerned about the implementation principles and execution efficiency of selectors. In addition, the analysis conclusion of this article is derived from the source code of the latest version of Sizzle. The code line numbers mentioned in the article are subject to the source code of this version, which can be downloaded from http://sizzlejs.com/. Time is short, so please be merciful if there are any imperfections in the analysis. Students who still have questions are welcome to continue communicating offline.

The above is the entire content of this article, I hope you all like it.

A brief discussion of Sizzle's 'compilation principle”_Others