Python 清理HTML标签类似PHP的strip_tags函数功能(二)
没有发现Python 有现成的类似功能模块,所以昨天写了个简单的 strip_tags 但还有些问题,今天应用到采集上时进行了部分功能的完善,
1. 对自闭和标签处理
2. 以及对标签参数的过滤
def strip_tags(html, save_tags=None, save_attrs=None): result = [] start = [] data = [] # 特殊的自闭和标签, 按 HTML5 的规则, 如 <br> <img alt="Python 清理HTML标签类似PHP的strip_tags函数功能(二)" > <wbr> 不再使用 /> 结尾 special_end_tags = [ 'area', 'base', 'br', 'col', 'embed', 'hr', 'img', 'input', 'keygen', 'link', 'meta', 'param', 'source', 'track', 'wbr' ] def starttag(tag, attrs): if tag not in save_tags: return start.append(tag) my_attrs = [] if attrs: for attr in attrs: if save_attrs and attr[0] not in save_attrs: continue my_attrs.append(attr[0] + '="' + attr[1] + '"') if my_attrs: my_attrs = ' ' + (' '.join(my_attrs)) else: my_attrs = '' else: my_attrs = '' result.append('') def endtag(tag): if start and tag == start[len(start) - 1]: # 特殊自闭和标签按照HTML5规则不加反斜杠直接尖括号结尾 if tag not in special_end_tags: result.append('' + tag + '>') parser = HTMLParser() parser.handle_data = result.append if save_tags: parser.handle_starttag = starttag parser.handle_endtag = endtag parser.feed(html) parser.close() for i in range(0, len(result)): tmp = result[i].rstrip('\n') tmp = tmp.lstrip('\n') if tmp: data.append(tmp) return ''.join(data)</wbr>

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

What are the magic methods of PHP? PHP's magic methods include: 1.\_\_construct, used to initialize objects; 2.\_\_destruct, used to clean up resources; 3.\_\_call, handle non-existent method calls; 4.\_\_get, implement dynamic attribute access; 5.\_\_set, implement dynamic attribute settings. These methods are automatically called in certain situations, improving code flexibility and efficiency.

The speed of mobile XML to PDF depends on the following factors: the complexity of XML structure. Mobile hardware configuration conversion method (library, algorithm) code quality optimization methods (select efficient libraries, optimize algorithms, cache data, and utilize multi-threading). Overall, there is no absolute answer and it needs to be optimized according to the specific situation.

Static binding (static::) implements late static binding (LSB) in PHP, allowing calling classes to be referenced in static contexts rather than defining classes. 1) The parsing process is performed at runtime, 2) Look up the call class in the inheritance relationship, 3) It may bring performance overhead.

Modifying XML content requires programming, because it requires accurate finding of the target nodes to add, delete, modify and check. The programming language has corresponding libraries to process XML and provides APIs to perform safe, efficient and controllable operations like operating databases.

An application that converts XML directly to PDF cannot be found because they are two fundamentally different formats. XML is used to store data, while PDF is used to display documents. To complete the transformation, you can use programming languages and libraries such as Python and ReportLab to parse XML data and generate PDF documents.

For small XML files, you can directly replace the annotation content with a text editor; for large files, it is recommended to use the XML parser to modify it to ensure efficiency and accuracy. Be careful when deleting XML comments, keeping comments usually helps code understanding and maintenance. Advanced tips provide Python sample code to modify comments using XML parser, but the specific implementation needs to be adjusted according to the XML library used. Pay attention to encoding issues when modifying XML files. It is recommended to use UTF-8 encoding and specify the encoding format.

To convert XML images, you need to determine the XML data structure first, then select a suitable graphical library (such as Python's matplotlib) and method, select a visualization strategy based on the data structure, consider the data volume and image format, perform batch processing or use efficient libraries, and finally save it as PNG, JPEG, or SVG according to the needs.

Use most text editors to open XML files; if you need a more intuitive tree display, you can use an XML editor, such as Oxygen XML Editor or XMLSpy; if you process XML data in a program, you need to use a programming language (such as Python) and XML libraries (such as xml.etree.ElementTree) to parse.
