使用PHP-Parser生成AST抽象语法树
0、前言
最近项目的流程逐渐清晰,但是很多关键性的技术没有掌握,也只能一步一步摸索。
由于要做基于数据流分析的静态代码分析,所以前端的工作如:词法分析、语法分析必不可少。Yacc和Lex什么的就不再考虑了,查了一天的资料,发现两款比较适合,一款是Java下的ANTLR,另一款是专门做PHP AST生成的PHP-Parser。
ANTLR是编译原理领域比较著名的工具了,相对于Yacc和Lex,更加实用。但是对PHP的语法文件只有一个,折腾了半天才生成调通,发现不太适合,对于”$a=1”生成tokens竟然是[$,a,=,1],无法识别assignment,做得过于粗糙,令人无比失望。
相比之下,PHP-Parser更加专业一些,毕竟专注PHP的词法、语法分析工作。
1、介绍
PHP-Parser的项目主页是https://github.com/nikic/PHP-Parser。可以对多版本的PHP进行完美解析,生成一颗抽象语法树。
对于词法分析,PHP有个内置函数token_get_all()可以用来获取TOKENS,作为语法分析的输入,这个开源项目也是用的token_get_all()生成的token流。
2、安装
安装也很简单,这里我是使用的PHP中的包管理工具composer添加的,在项目目录中执行以下命令即可:
php composer.phar require nikic/php-parser
如果没有下载Composer,应该先执行下面的命令:
Curl -s http://getcomposer.org/installer | php
3、生成AST
使用composer添加php-parser之后,就可以方便使用。
首先介绍一下PHP-Parser中定义的一些节点类型:
(1)PhpParser\Node\Stmt是语句节点,不带任何返回信息(return)的结构,如赋值语句”$a = $b” ;
(2)PhpParser\Node\Expr是表达式节点,可以返回一个值的语言结构,如$var和func()。
(3)PhpParser\Node\Scalar是常量节点,可以用来表示任何常量值。如’string’,0,以及常量表达式。
(4)还有一些节点没有包括进去,如参数节点(PhpParser\Node\Arg)。
一些节点类的名称使用了下划线,这是为了避免和PHP关键字冲突。
PHP-parser的HelloWorld程序如下,该代码片段会生成AST:
输出结果为:
<span style="font-size:12px;">Array( [0] => PhpParser\Node\Stmt\Echo_ Object ( [subNodes:protected] => Array ( [exprs] => Array ( [0] => PhpParser\Node\Scalar\String Object ( [subNodes:protected] => Array ( [value] => 1+2 ) [attributes:protected] => Array ( [startLine] => 1 [endLine] => 1 ) ) [1] => PhpParser\Node\Scalar\String Object ( [subNodes:protected] => Array ( [value] => chongrui ) [attributes:protected] => Array ( [startLine] => 1 [endLine] => 1 ) ) ) ) [attributes:protected] => Array ( [startLine] => 1 [endLine] => 1 ) ))</span>
可以看到,这课AST只有一个节点Echo_,此节点有一个子节点exprs,可以使用$stmts[0]->exprs进行访问。
对于节点中的attributes信息是用来存储startLine和endLine以及comments的。可以使用getAttributes(),getAttribute(‘startLine’),setAttribute(),hasAttribute()方法进行访问。
开始行号startLine可以通过getLine()/setLine()方法进行访问(也可以getAttribute(‘startLine’))。注释信息可以使用getDocComment()获取。
访问节点上的值:如访问值“chongrui”,使用$stmts[0]->exprs[1]->value;即可。
4、节点遍历
对抽象语法树的遍历非常方便,使用PhpParser\NodeTraverser类即可。同时,支持自定义的Visitor对象。因为在实际应用中,对PHP源码进行分析,往往是不知道AST的具体结构,这时需要动态的去判断每个节点的类型信息。
这些判断统一写到MyNodeVisitor中,该类继承了一个父类NodeVisitorAbstract,这个类中有一些方法:
(1)beforeTraverse()方法用于遍历之前,通常用来在遍历前对值进行重置。
(2)afterTraverse()方法和(1)相同,唯一不同的地方是遍历之后才触发。
(3)enterNode()和leaveNode()方法在对每个节点访问时触发。
enterNode在进入节点时触发,比如在访问节点的子节点之前。这个方法可以返回NodeTraverser::DONT_TRAVERSER_CHILDREN,用来跳过该节点的孩子节点。
leaveNode在遍历节点完成之后触发。它可以返回
NodeTraverser::REMOVE_NODE,这种情况下,当前节点会被删除。如果返回一个节点的集合,那么这些节点会并入到父节点的array中,比如array(A,B,C),B节点被array(X,Y,Z)替换,变成array(A,X,Y,Z,C) .
下面的代码片段对$code进行解析,生成AST,并且在遍历时,当发现遍历节点时String类型时,就进行输出。
结果会输出1,2。
5、其他AST表示
有时候会将AST进行文本化持久保存,这个功能PHP-Parser也支持。
(1)简单的进行序列化
使用serialize()和unserialize()进行序列化和反序列化操作,可以对AST进行持久保存。
(2)易于阅读的保存形式
分别是完美打印和XML持久存储,在这里不做详细介绍,有需要的时候可以看项目的文档:
https://github.com/nikic/PHP-Parser/blob/master/doc/3_Other_node_tree_representations.markdown
6、总结
至少在PHP静态分析方面,PHP-Parser在功能方面大大优于ANTLR。如何构建一个PHP自动化审计系统,这个PHP-Parser肯定会发挥不小的作用:)~

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











PHP and Python each have their own advantages, and choose according to project requirements. 1.PHP is suitable for web development, especially for rapid development and maintenance of websites. 2. Python is suitable for data science, machine learning and artificial intelligence, with concise syntax and suitable for beginners.

In PHP, password_hash and password_verify functions should be used to implement secure password hashing, and MD5 or SHA1 should not be used. 1) password_hash generates a hash containing salt values to enhance security. 2) Password_verify verify password and ensure security by comparing hash values. 3) MD5 and SHA1 are vulnerable and lack salt values, and are not suitable for modern password security.

PHP is widely used in e-commerce, content management systems and API development. 1) E-commerce: used for shopping cart function and payment processing. 2) Content management system: used for dynamic content generation and user management. 3) API development: used for RESTful API development and API security. Through performance optimization and best practices, the efficiency and maintainability of PHP applications are improved.

PHP is a scripting language widely used on the server side, especially suitable for web development. 1.PHP can embed HTML, process HTTP requests and responses, and supports a variety of databases. 2.PHP is used to generate dynamic web content, process form data, access databases, etc., with strong community support and open source resources. 3. PHP is an interpreted language, and the execution process includes lexical analysis, grammatical analysis, compilation and execution. 4.PHP can be combined with MySQL for advanced applications such as user registration systems. 5. When debugging PHP, you can use functions such as error_reporting() and var_dump(). 6. Optimize PHP code to use caching mechanisms, optimize database queries and use built-in functions. 7

PHP type prompts to improve code quality and readability. 1) Scalar type tips: Since PHP7.0, basic data types are allowed to be specified in function parameters, such as int, float, etc. 2) Return type prompt: Ensure the consistency of the function return value type. 3) Union type prompt: Since PHP8.0, multiple types are allowed to be specified in function parameters or return values. 4) Nullable type prompt: Allows to include null values and handle functions that may return null values.

PHP is still dynamic and still occupies an important position in the field of modern programming. 1) PHP's simplicity and powerful community support make it widely used in web development; 2) Its flexibility and stability make it outstanding in handling web forms, database operations and file processing; 3) PHP is constantly evolving and optimizing, suitable for beginners and experienced developers.

PHP and Python each have their own advantages, and the choice should be based on project requirements. 1.PHP is suitable for web development, with simple syntax and high execution efficiency. 2. Python is suitable for data science and machine learning, with concise syntax and rich libraries.

PHP and Python have their own advantages and disadvantages, and the choice depends on project needs and personal preferences. 1.PHP is suitable for rapid development and maintenance of large-scale web applications. 2. Python dominates the field of data science and machine learning.
