首页 > 编程语言 >Python BeautifulSoup4

Python BeautifulSoup4

时间:2022-12-10 09:23:35浏览次数:45  
标签:Python html soup HTML file BeautifulSoup4 data

What's beautifulsoup4?

BeautifulSoup4 is a Python library for extracting data from HTML and XML files. It provides a simple, powerful, and flexible API for navigating, searching, and modifying the data in these files.

BeautifulSoup4 is commonly used in web scraping and data mining projects, where the goal is to extract specific data from a large number of web pages or XML documents. It makes it easy to parse the data and extract only the information that you need, without having to write complex regular expressions or custom parsing code.

BeautifulSoup4 is built on top of the HTML and XML parsers in the Python standard library, so it is fast and reliable. It also handles malformed or incomplete HTML and XML documents gracefully, and provides a number of useful features for working with the parsed data.

Here is an example of using BeautifulSoup4 to extract data from an HTML file:

from bs4 import BeautifulSoup

with open("index.html") as f:
    soup = BeautifulSoup(f, "html.parser")

print(soup.title)
print(soup.body.p)

The code above opens an HTML file called "index.html" and uses BeautifulSoup4 to parse the contents of the file. It then prints the <title> element and the first <p> element in the <body> of the HTML file.

You can also use BeautifulSoup4 to modify the data in an HTML or XML file, and then write the modified data back to the file. This can be useful for cleaning up or transforming the data in the file, or for adding new data to the file.

Here is an example of using BeautifulSoup4 to modify an HTML file:

from bs4 import BeautifulSoup

with open("index.html") as f:
    soup = BeautifulSoup(f, "html.parser")

soup.title.string = "My Awesome Website"

with open("index.html", "w") as f:
    f.write(str(soup))

The code above opens an HTML file called "index.html" and uses BeautifulSoup4 to parse the contents of the file. It then changes the text in the <title> element to "My Awesome Website", and writes the modified HTML back to the file.

 

标签:Python,html,soup,HTML,file,BeautifulSoup4,data
From: https://www.cnblogs.com/chucklu/p/16970763.html

相关文章