Home > Web Front-end > HTML Tutorial > How to read text content in html file

How to read text content in html file

下次还敢
Release: 2024-04-11 13:57:24
Original
556 people have browsed it

To read the text content in an HTML file, perform the following steps: Load the HTML file Parse the HTML Extract text using the text attribute or get_text() method Optional: Clean text (remove whitespace, special characters and convert to lowercase ) Output text (print, write to file, etc.)

How to read text content in html file

How to read text content in HTML files

To extract text content from an HTML file, you can use the following steps:

1. Load the HTML file

<code class="python">import requests

url = 'https://example.com'
response = requests.get(url)</code>
Copy after login

2. Parse the HTML

<code class="python">from bs4 import BeautifulSoup

soup = BeautifulSoup(response.text, 'html.parser')</code>
Copy after login

3. Extract text content

There are two ways to extract text content:

  • Usetext Attributes: Extract all text within the HTML tag, including the tag itself.
<code class="python">text = soup.text</code>
Copy after login
  • Use get_text() Method: Extract the text within the HTML tag, but ignore the tag itself.
<code class="python">text = soup.get_text()</code>
Copy after login

4. Clean text content (optional)

If you need to further clean up text content, you can perform the following operations:

  • Remove white space characters:
<code class="python">text = text.replace(' ', '')</code>
Copy after login
  • Remove special characters:
<code class="python">import string

text = text.translate(str.maketrans('', '', string.punctuation))</code>
Copy after login
  • Convert to lowercase:
<code class="python">text = text.lower()</code>
Copy after login

5. Output text content

You can output text content in a variety of ways:

  • Print to console:
<code class="python">print(text)</code>
Copy after login
  • Write to file:
<code class="python">with open('output.txt', 'w') as f:
    f.write(text)</code>
Copy after login

The above is the detailed content of How to read text content in html file. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template