In web development, JavaScript is often used to implement some functions. In HTML pages, JavaScript code snippets are usually embedded in <script>
tags, but sometimes script snippets are not placed in the standard <script>
tags, but It exists in the attributes of other HTML elements, such as onclick
, onload
, etc.
If we want to find all the JavaScript code snippets in the HTML page, we can use PHP's regular expression to match.
Regular expression (regular expression) is a grammatical rule used to describe string patterns. In PHP, use /
symbols to wrap regular expressions, such as /pattern/
, where pattern
represents the pattern to be matched.
Commonly used regular expression metacharacters include:
.
: Matches any single character *
: Match zero or more instances of the previous character
: Match one or more instances of the previous character ?
: Match before One or zero instances of a character |
: Selects to match one of the items in the string d
: Matches the digit w
: Matches letters, numbers, and underscores s
: Matches whitespace characters such as spaces, tabs, newlines, etc. First, we can use the preg_match_all
function to match all <script>
tags in the HTML page:
$html = file_get_contents('example.html'); // 获取 HTML 文件内容 $pattern = "/<script(.*?)>(.*?)</script>/is"; // 匹配 script 标记的正则表达式 preg_match_all($pattern, $html, $matches); // 执行匹配
In the above code, we use the file_get_contents
function to get the contents of an HTML file, and then use the regular expression/<script(.*?)>(.*?)< ;/script>/is
Matches the content of all <script>
tags in the HTML page and stores the matching results in the $matches
array.
However, this only gets the JavaScript code contained in the <script>
tag, not the code in other attributes.
First, we need to know the name of the attribute that contains the JavaScript code. For example, JavaScript code for a click event might exist in the onclick
attribute, and JavaScript code for other events might exist in onload
, onsubmit
, onchange
and other attributes.
We can use PHP's built-in get_meta_tags
function to get all the meta tags of the HTML page and analyze their attributes to find out the attribute names containing JavaScript code:
$html = file_get_contents('example.html'); // 获取 HTML 文件内容 $meta_tags = get_meta_tags('data://text/html;base64,' . base64_encode($html)); // 获取元标记信息 $pattern = "/on[a-z]+=['"](.*?)['"]/i"; // 匹配属性中的 JavaScript 代码的正则表达式 $matches = array(); // 存储匹配结果 foreach($meta_tags as $tag=>$value) { // 遍历元标记 if(preg_match_all($pattern, $value, $submatches)) { // 匹配属性中的 JavaScript 代码 $matches = array_merge($matches, $submatches[1]); // 合并匹配结果 } }
Above In the code, we use the get_meta_tags
function to get the meta tags of the HTML page. Then, we use the regular expression "/on[a-z] =['"](.*?)['"]/i"
to match all attribute names starting with on
Properties that contain JavaScript code. Finally, we use the preg_match_all
function to store the matched results in the $matches
array.
Through the above two steps, we have successfully found all the JavaScript code in the HTML page. Now, we need to combine these code snippets into a string that can be easily processed.
$html = file_get_contents('example.html'); // 获取 HTML 文件内容 $script_pattern = "/<script(.*?)>(.*?)</script>/is"; $attr_pattern = "/on[a-z]+=['"](.*?)['"]/i"; preg_match_all($script_pattern, $html, $script_matches); // 匹配 script 标记中的代码 $attr_matches = array(); // 存储属性中的代码 $meta_tags = get_meta_tags('data://text/html;base64,' . base64_encode($html)); // 获取元标记 foreach($meta_tags as $tag=>$value) { // 遍历元标记 if(preg_match_all($attr_pattern, $value, $submatches)) { // 匹配属性中的代码 $attr_matches = array_merge($attr_matches, $submatches[1]); } } $all_script = implode(" ", array_merge($script_matches[2], $attr_matches)); // 合并所有代码为一个字符串
In the above code, we use the implode
function to merge all the JavaScript code snippets in $script_matches[2]
and $attr_matches
into A string using newline characters to separate each code fragment for further processing.
The above is the detailed content of PHP Regular Expressions: How to match all JavaScript code in HTML. For more information, please follow other related articles on the PHP Chinese website!