This article mainly introduces the relevant information about Python using the Beautiful Soup module to create objects. The introduction in the article is very detailed. I believe it has certain reference value for everyone. Friends who need it can take a look below.
Installation
Install the Beautiful Soup module via pip: pip install beautifulsoup4 .
You can also use PyCharm IDE to write code. Find Project in Preferences in PyCharm, search for the Beautiful Soup module in it, and install it.
Create a BeautifulSoup object
The Beautiful Soup module is widely used to get data from web pages. We can use the Beautiful Soup module to extract any data from an HTML/XML document, for example, all links in a web page or content within tags.
To achieve this, Beautiful Soup provides different objects and methods. Any HTML/XML document can be converted into different Beautiful Soup objects. These objects have different properties and methods, and we can extract the required data from them.
Beautiful Soup has a total of three objects:
BeautifulSoup
Tag
NavigableString
Create a BeautifulSoup object
Creating a BeautifulSoup object is the starting point for any Beautiful Soup project.
BeautifulSoup can pass a string or file-like object, such as a file or web page on the machine.
Creating BeautifulSoup objects from strings
Create objects by passing a string in the constructor of BeautifulSoup.
In addition to passing file-like objects, we can also pass local file objects to the constructor of BeautifulSoup to generate objects.
with open('foo.html','r') as foo_file :
soup_foo = BeautifulSoup(foo_file)
print soup_foo
Copy after login
Creating BeautifulSoup objects for XML parsing
The Beautiful Soup module can also be used to parse XML.
When creating a BeautifulSoup object, the Beautiful Soup module will select the appropriate TreeBuilder class to create the HTML/XML tree. By default, the HTML TreeBuilder object is selected, which will use the default HTML parser to produce an HTML structure tree. In the above code, the BeautifulSoup object is generated from the string by parsing it into an HTML tree structure.
If we want the Beautiful Soup module to parse the input content into XML type, then we need to accurately specify the features parameter used in the Beautiful Soup constructor. By specifying the features parameter, Beautiful Soup will select the most suitable TreeBuilder class to meet the features we want.
Understanding features parameters
Each TreeBuilder will have different features depending on the parser it uses. Therefore, the input content will have different results depending on the features parameter passed to the constructor. In the Beautiful Soup module, the parser currently used by TreeBuilder is as follows:
lxml
html5lib
html.parser
The features parameter of the BeautifulSoup constructor can accept a string list or a string value.
Currently, the features parameters and parsers supported by each TreeBuilder are as shown in the following table:
PHP is mainly procedural programming, but also supports object-oriented programming (OOP); Python supports a variety of paradigms, including OOP, functional and procedural programming. PHP is suitable for web development, and Python is suitable for a variety of applications such as data analysis and machine learning.
PHP is suitable for web development and rapid prototyping, and Python is suitable for data science and machine learning. 1.PHP is used for dynamic web development, with simple syntax and suitable for rapid development. 2. Python has concise syntax, is suitable for multiple fields, and has a strong library ecosystem.
Python is more suitable for beginners, with a smooth learning curve and concise syntax; JavaScript is suitable for front-end development, with a steep learning curve and flexible syntax. 1. Python syntax is intuitive and suitable for data science and back-end development. 2. JavaScript is flexible and widely used in front-end and server-side programming.
VS Code can be used to write Python and provides many features that make it an ideal tool for developing Python applications. It allows users to: install Python extensions to get functions such as code completion, syntax highlighting, and debugging. Use the debugger to track code step by step, find and fix errors. Integrate Git for version control. Use code formatting tools to maintain code consistency. Use the Linting tool to spot potential problems ahead of time.
PHP originated in 1994 and was developed by RasmusLerdorf. It was originally used to track website visitors and gradually evolved into a server-side scripting language and was widely used in web development. Python was developed by Guidovan Rossum in the late 1980s and was first released in 1991. It emphasizes code readability and simplicity, and is suitable for scientific computing, data analysis and other fields.
In VS Code, you can run the program in the terminal through the following steps: Prepare the code and open the integrated terminal to ensure that the code directory is consistent with the terminal working directory. Select the run command according to the programming language (such as Python's python your_file_name.py) to check whether it runs successfully and resolve errors. Use the debugger to improve debugging efficiency.
VS Code can run on Windows 8, but the experience may not be great. First make sure the system has been updated to the latest patch, then download the VS Code installation package that matches the system architecture and install it as prompted. After installation, be aware that some extensions may be incompatible with Windows 8 and need to look for alternative extensions or use newer Windows systems in a virtual machine. Install the necessary extensions to check whether they work properly. Although VS Code is feasible on Windows 8, it is recommended to upgrade to a newer Windows system for a better development experience and security.
VS Code extensions pose malicious risks, such as hiding malicious code, exploiting vulnerabilities, and masturbating as legitimate extensions. Methods to identify malicious extensions include: checking publishers, reading comments, checking code, and installing with caution. Security measures also include: security awareness, good habits, regular updates and antivirus software.