How should Python Ast abstract syntax tree be used?-Python Tutorial-php.cn

Table of Contents

Introduction

Python provides two ways to traverse the entire abstract syntax tree.

Change the addition operation in the add function in func_def to subtraction, and add a call log for the function implementation.

Using NodeVisitor mainly changes the AST structure by modifying the nodes on the syntax tree. NodeTransformer mainly replaces the AST in ast. node.

AST module is rarely used in actual programming, but it is very meaningful as an auxiliary source code checking method; syntax checking, debugging errors, special field detection wait.

The following is the unicode encoding range of Chinese, Japanese and Korean characters

4.2 Closure 检查

Home

Backend Development

Python Tutorial

How should Python Ast abstract syntax tree be used?

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

May 09, 2023 pm 12:49 PM

python ast

Introduction

Abstract Syntax Trees are abstract syntax trees. Ast is an intermediate product from Python source code to bytecode. With the help of the ast module, the source code structure can be analyzed from the perspective of a syntax tree.

In addition, we can not only modify and execute the syntax tree, but also unparse the syntax tree generated by Source into python source code. Therefore, ast leaves enough room for Python source code checking, syntax analysis, code modification, and code debugging.

1. Introduction to AST

The CPython interpreter officially provided by Python processes python source code as follows:

Parse source code into a parse tree (Parser/ pgen.c)

Transform parse tree into an Abstract Syntax Tree (Python/ast.c)

Transform AST into a Control Flow Graph (Python/compile.c)

Emit bytecode based on the Control Flow Graph (Python/compile.c)

The actual python code processing process is as follows:

Source code analysis--> Syntax tree- -> Abstract syntax tree (AST) --> Control flow graph --> Bytecode

The above process is applied after python2.5. Python source code is first parsed into a syntax tree, and then converted into an abstract syntax tree. In the abstract syntax tree we can see the syntax structure of python in the source code file.

Most of the time programming may not require the use of abstract syntax trees, but under specific conditions and requirements, AST has its own special convenience.

The following is a simple example of abstract syntax.

Module(body=[
    Print(
          dest=None,
          values=[BinOp( left=Num(n=1),op=Add(),right=Num(n=2))],
          nl=True,
 )])

Copy after login

2. Create AST

2.1 Compile function

First, let’s briefly understand the compile function.

compile(source, filename, mode[, flags[, dont_inherit]])

##source -- String or AST (Abstract Syntax Trees) object. Generally, the entire py file content can be passed in to file.read().
filename -- The name of the code file, or some identifiable value if the code is not being read from a file.
#mode -- Specify the type of compiled code. Can be specified as exec, eval, single.
#flags -- Variable scope, local namespace, if provided, can be any mapping object.
flags and dont_inherit are used to control the flags when compiling source code.

func_def = \
"""
def add(x, y):
    return x + y
print add(3, 5)
"""

Copy after login

Use Compile to compile and execute:

>>> cm = compile(func_def, &#39;<string>&#39;, &#39;exec&#39;)
>>> exec cm
>>> 8

Copy after login

The above func_def is compiled by compile to get the bytecode, cm is the code object,

True == isinstance(cm, types.CodeType).

##compile(source, filename, mode, ast.PyCF_ONLY_AST) <==> ast.parse(source, filename='', mode='exec' )

2.2 Generate ast

Use the above func_def to generate ast.

r_node = ast.parse(func_def)
print astunparse.dump(r_node)    # print ast.dump(r_node)

Copy after login

The following is the ast structure corresponding to func_def:

Module(body=[
    FunctionDef(
        name=&#39;add&#39;,
        args=arguments(
            args=[Name(id=&#39;x&#39;,ctx=Param()),Name(id=&#39;y&#39;,ctx=Param())],
            vararg=None,
            kwarg=None,
            defaults=[]),
        body=[Return(value=BinOp(
            left=Name(id=&#39;x&#39;,ctx=Load()),
            op=Add(),
            right=Name(id=&#39;y&#39;,ctx=Load())))],
        decorator_list=[]),
    Print(
        dest=None,
        values=[Call(
                func=Name(id=&#39;add&#39;,ctx=Load()),
                args=[Num(n=3),Num(n=5)],
                keywords=[],
                starargs=None,
                kwargs=None)],
        nl=True)
  ])

Copy after login

Except ast .dump, there are many third-party libraries for dumping ast, such as astunparse, codegen, unparse, etc. These third-party libraries can not only display the AST structure in a better way, but also reversely export AST to python source code.

module Python version "$Revision$"
{
  mod = Module(stmt* body)| Expression(expr body)
  stmt = FunctionDef(identifier name, arguments args, stmt* body, expr* decorator_list)
        | ClassDef(identifier name, expr* bases, stmt* body, expr* decorator_list)
        | Return(expr? value)
        | Print(expr? dest, expr* values, bool nl)| For(expr target, expr iter, stmt* body, stmt* orelse)
  expr = BoolOp(boolop op, expr* values)
       | BinOp(expr left, operator op, expr right)| Lambda(arguments args, expr body)| Dict(expr* keys, expr* values)| Num(object n) -- a number as a PyObject.
       | Str(string s) -- need to specify raw, unicode, etc?| Name(identifier id, expr_context ctx)
       | List(expr* elts, expr_context ctx) 
        -- col_offset is the byte offset in the utf8 string the parser uses
        attributes (int lineno, int col_offset)
  expr_context = Load | Store | Del | AugLoad | AugStore | Param
  boolop = And | Or 
  operator = Add | Sub | Mult | Div | Mod | Pow | LShift | RShift | BitOr | BitXor | BitAnd | FloorDiv
  arguments = (expr* args, identifier? vararg, identifier? kwarg, expr* defaults)
}

Copy after login

The above is part of the Abstract Grammar taken from the official website. During the actual traversal of ast Node, its properties are accessed according to the type of Node.

3. Traverse the AST

Python provides two ways to traverse the entire abstract syntax tree.

3.1 ast.NodeTransfer

Change the addition operation in the add function in func_def to subtraction, and add a call log for the function implementation.

  class CodeVisitor(ast.NodeVisitor):
      def visit_BinOp(self, node):
          if isinstance(node.op, ast.Add):
              node.op = ast.Sub()
          self.generic_visit(node)
      def visit_FunctionDef(self, node):
          print &#39;Function Name:%s&#39;% node.name
          self.generic_visit(node)
          func_log_stmt = ast.Print(
              dest = None,
              values = [ast.Str(s = &#39;calling func: %s&#39; % node.name, lineno = 0, col_offset = 0)],
              nl = True,
              lineno = 0,
              col_offset = 0,
          )
          node.body.insert(0, func_log_stmt)
  r_node = ast.parse(func_def)
  visitor = CodeVisitor()
  visitor.visit(r_node)
  # print astunparse.dump(r_node)
  print astunparse.unparse(r_node)
  exec compile(r_node, &#39;<string>&#39;, &#39;exec&#39;)

Copy after login

Running results:

Function Name:add
def add(x, y):
    print &#39;calling func: add&#39;
    return (x - y)
print add(3, 5)
calling func: add
-2

Copy after login

3.2 ast.NodeTransformer

Using NodeVisitor mainly changes the AST structure by modifying the nodes on the syntax tree. NodeTransformer mainly replaces the AST in ast. node.

Since the add defined in func_def has been changed to a subtraction function, then we will be more thorough and change the function name, parameters and called functions in ast, and log the added function call The writing is more complicated, and I try to change it beyond recognition:-)

  class CodeTransformer(ast.NodeTransformer):
      def visit_BinOp(self, node):
          if isinstance(node.op, ast.Add):
              node.op = ast.Sub()
          self.generic_visit(node)
          return node
      def visit_FunctionDef(self, node):
          self.generic_visit(node)
          if node.name == &#39;add&#39;:
              node.name = &#39;sub&#39;
          args_num = len(node.args.args)
          args = tuple([arg.id for arg in node.args.args])
          func_log_stmt = &#39;&#39;.join(["print &#39;calling func: %s&#39;, " % node.name, "&#39;args:&#39;", ", %s" * args_num % args])
          node.body.insert(0, ast.parse(func_log_stmt))
          return node
      def visit_Name(self, node):
          replace = {&#39;add&#39;: &#39;sub&#39;, &#39;x&#39;: &#39;a&#39;, &#39;y&#39;: &#39;b&#39;}
          re_id = replace.get(node.id, None)
          node.id = re_id or node.id
          self.generic_visit(node)
          return node
  r_node = ast.parse(func_def)
  transformer = CodeTransformer()
  r_node = transformer.visit(r_node)
  # print astunparse.dump(r_node)
  source = astunparse.unparse(r_node)
  print source
  # exec compile(r_node, &#39;<string>&#39;, &#39;exec&#39;)        # 新加入的node func_log_stmt 缺少lineno和col_offset属性
  exec compile(source, &#39;<string>&#39;, &#39;exec&#39;)
  exec compile(ast.parse(source), &#39;<string>&#39;, &#39;exec&#39;)

Copy after login

Result:

def sub(a, b):
    print &#39;calling func: sub&#39;, &#39;args:&#39;, a, b
    return (a - b)
print sub(3, 5)
calling func: sub args: 3 5
-2
calling func: sub args: 3 5
-2

Copy after login

The difference between the two can be clearly seen in the code. I won’t go into details here.

4.AST application

AST module is rarely used in actual programming, but it is very meaningful as an auxiliary source code checking method; syntax checking, debugging errors, special field detection wait.

The above adding call log information to the function is a way to debug the python source code, but in reality we traverse and modify the source code by parse the entire python file.

4.1 Chinese character detection

The following is the unicode encoding range of Chinese, Japanese and Korean characters

CJK Unified Ideographs

Range : 4E00— 9FFF

Number of characters: 20992
Languages: chinese, japanese, korean, vietnamese
Use unicode range

\u4e00 - \u9fff

To identify Chinese characters, note that this range does not include Chinese characters (e.g. u';' == u'\uff1b'). The following is a class that determines whether a string contains Chinese characters CNCheckHelper:

  class CNCheckHelper(object):
      # 待检测文本可能的编码方式列表
      VALID_ENCODING = (&#39;utf-8&#39;, &#39;gbk&#39;)
      def _get_unicode_imp(self, value, idx = 0):
          if idx < len(self.VALID_ENCODING):
              try:
                  return value.decode(self.VALID_ENCODING[idx])
              except:
                  return self._get_unicode_imp(value, idx + 1)
      def _get_unicode(self, from_str):
          if isinstance(from_str, unicode):
              return None
          return self._get_unicode_imp(from_str)
      def is_any_chinese(self, check_str, is_strict = True):
          unicode_str = self._get_unicode(check_str)
          if unicode_str:
              c_func = any if is_strict else all
              return c_func(u&#39;\u4e00&#39; <= char <= u&#39;\u9fff&#39; for char in unicode_str)
          return False

Copy after login

The interface is_any_chinese has two judgment modes. Strict detection can be checked as long as it contains Chinese strings, and non-strict detection must contain all Chinese characters.

下面我们利用ast来遍历源文件的抽象语法树，并检测其中字符串是否包含中文字符。

  class CodeCheck(ast.NodeVisitor):
      def __init__(self):
          self.cn_checker = CNCheckHelper()
      def visit_Str(self, node):
          self.generic_visit(node)
          # if node.s and any(u&#39;\u4e00&#39; <= char <= u&#39;\u9fff&#39; for char in node.s.decode(&#39;utf-8&#39;)):
          if self.cn_checker.is_any_chinese(node.s, True):
              print &#39;line no: %d, column offset: %d, CN_Str: %s&#39; % (node.lineno, node.col_offset, node.s)
  project_dir = &#39;./your_project/script&#39;
  for root, dirs, files in os.walk(project_dir):
      print root, dirs, files
      py_files = filter(lambda file: file.endswith(&#39;.py&#39;), files)
      checker = CodeCheck()
      for file in py_files:
          file_path = os.path.join(root, file)
          print &#39;Checking: %s&#39; % file_path
          with open(file_path, &#39;r&#39;) as f:
              root_node = ast.parse(f.read())
              checker.visit(root_node)

Copy after login

上面这个例子比较的简单，但大概就是这个意思。

关于CPython解释器执行源码的过程可以参考官网描述：PEP 339

4.2 Closure 检查

一个函数中定义的函数或者lambda中引用了父函数中的local variable，并且当做返回值返回。特定场景下闭包是非常有用的，但是也很容易被误用。

关于python闭包的概念可以参考我的另一篇文章：理解Python闭包概念

这里简单介绍一下如何借助ast来检测lambda中闭包的引用。代码如下：

  class LambdaCheck(ast.NodeVisitor):
      def __init__(self):
          self.illegal_args_list = []
          self._cur_file = None
          self._cur_lambda_args = []
      def set_cur_file(self, cur_file):
          assert os.path.isfile(cur_file)， cur_file
          self._cur_file = os.path.realpath(cur_file)
      def visit_Lambda(self, node):
          """
          lambda 闭包检查原则：
          只需检测lambda expr body中args是否引用了lambda args list之外的参数
          """
          self._cur_lambda_args =[a.id for a in node.args.args]
          print astunparse.unparse(node)
          # print astunparse.dump(node)
          self.get_lambda_body_args(node.body)
          self.generic_visit(node)
      def record_args(self, name_node):
          if isinstance(name_node, ast.Name) and name_node.id not in self._cur_lambda_args:
              self.illegal_args_list.append((self._cur_file, &#39;line no:%s&#39; % name_node.lineno, &#39;var:%s&#39; % name_node.id))
      def _is_args(self, node):
          if isinstance(node, ast.Name):
              self.record_args(node)
              return True
          if isinstance(node, ast.Call):
              map(self.record_args, node.args)
              return True
          return False
      def get_lambda_body_args(self, node):
          if self._is_args(node): return
          # for cnode in ast.walk(node):
          for cnode in ast.iter_child_nodes(node):
              if not self._is_args(cnode):
                  self.get_lambda_body_args(cnode)

Copy after login

遍历工程文件：

  project_dir = &#39;./your project/script&#39;
  for root, dirs, files in os.walk(project_dir):
      py_files = filter(lambda file: file.endswith(&#39;.py&#39;), files)
      checker = LambdaCheck()
      for file in py_files:
          file_path = os.path.join(root, file)
          checker.set_cur_file(file_path)
          with open(file_path, &#39;r&#39;) as f:
              root_node = ast.parse(f.read())
              checker.visit(root_node)
      res = &#39;\n&#39;.join([&#39; ## &#39;.join(info) for info in checker.illegal_args_list])
      print res

Copy after login

由于Lambda(arguments args, expr body)中的body expression可能非常复杂，上面的例子中仅仅处理了比较简单的body expr。可根据自己工程特点修改和扩展检查规则。为了更加一般化可以单独写一个visitor类来遍历lambda节点。

The above is the detailed content of How should Python Ast abstract syntax tree be used?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Chat Commands and How to Use Them

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7530

CakePHP Tutorial

1378

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

PHP and Python: Code Examples and Comparison Apr 15, 2025 am 12:07 AM

PHP and Python have their own advantages and disadvantages, and the choice depends on project needs and personal preferences. 1.PHP is suitable for rapid development and maintenance of large-scale web applications. 2. Python dominates the field of data science and machine learning.

Detailed explanation of docker principle Apr 14, 2025 pm 11:57 PM

Docker uses Linux kernel features to provide an efficient and isolated application running environment. Its working principle is as follows: 1. The mirror is used as a read-only template, which contains everything you need to run the application; 2. The Union File System (UnionFS) stacks multiple file systems, only storing the differences, saving space and speeding up; 3. The daemon manages the mirrors and containers, and the client uses them for interaction; 4. Namespaces and cgroups implement container isolation and resource limitations; 5. Multiple network modes support container interconnection. Only by understanding these core concepts can you better utilize Docker.

How to train PyTorch model on CentOS Apr 14, 2025 pm 03:03 PM

Efficient training of PyTorch models on CentOS systems requires steps, and this article will provide detailed guides. 1. Environment preparation: Python and dependency installation: CentOS system usually preinstalls Python, but the version may be older. It is recommended to use yum or dnf to install Python 3 and upgrade pip: sudoyumupdatepython3 (or sudodnfupdatepython3), pip3install--upgradepip. CUDA and cuDNN (GPU acceleration): If you use NVIDIAGPU, you need to install CUDATool

How is the GPU support for PyTorch on CentOS Apr 14, 2025 pm 06:48 PM

Enable PyTorch GPU acceleration on CentOS system requires the installation of CUDA, cuDNN and GPU versions of PyTorch. The following steps will guide you through the process: CUDA and cuDNN installation determine CUDA version compatibility: Use the nvidia-smi command to view the CUDA version supported by your NVIDIA graphics card. For example, your MX450 graphics card may support CUDA11.1 or higher. Download and install CUDAToolkit: Visit the official website of NVIDIACUDAToolkit and download and install the corresponding version according to the highest CUDA version supported by your graphics card. Install cuDNN library:

Python vs. JavaScript: Community, Libraries, and Resources Apr 15, 2025 am 12:16 AM

Python and JavaScript have their own advantages and disadvantages in terms of community, libraries and resources. 1) The Python community is friendly and suitable for beginners, but the front-end development resources are not as rich as JavaScript. 2) Python is powerful in data science and machine learning libraries, while JavaScript is better in front-end development libraries and frameworks. 3) Both have rich learning resources, but Python is suitable for starting with official documents, while JavaScript is better with MDNWebDocs. The choice should be based on project needs and personal interests.

How to choose the PyTorch version under CentOS Apr 14, 2025 pm 02:51 PM

When selecting a PyTorch version under CentOS, the following key factors need to be considered: 1. CUDA version compatibility GPU support: If you have NVIDIA GPU and want to utilize GPU acceleration, you need to choose PyTorch that supports the corresponding CUDA version. You can view the CUDA version supported by running the nvidia-smi command. CPU version: If you don't have a GPU or don't want to use a GPU, you can choose a CPU version of PyTorch. 2. Python version PyTorch

How to install nginx in centos Apr 14, 2025 pm 08:06 PM

CentOS Installing Nginx requires following the following steps: Installing dependencies such as development tools, pcre-devel, and openssl-devel. Download the Nginx source code package, unzip it and compile and install it, and specify the installation path as /usr/local/nginx. Create Nginx users and user groups and set permissions. Modify the configuration file nginx.conf, and configure the listening port and domain name/IP address. Start the Nginx service. Common errors need to be paid attention to, such as dependency issues, port conflicts, and configuration file errors. Performance optimization needs to be adjusted according to the specific situation, such as turning on cache and adjusting the number of worker processes.

How to do data preprocessing with PyTorch on CentOS Apr 14, 2025 pm 02:15 PM

Efficiently process PyTorch data on CentOS system, the following steps are required: Dependency installation: First update the system and install Python3 and pip: sudoyumupdate-ysudoyuminstallpython3-ysudoyuminstallpython3-pip-y Then, download and install CUDAToolkit and cuDNN from the NVIDIA official website according to your CentOS version and GPU model. Virtual environment configuration (recommended): Use conda to create and activate a new virtual environment, for example: condacreate-n

See all articles