bs4 将一个复杂的html文档转化为一个复杂的树形结构,每个节点都是python对象,所有对象可以分为四种:Tag、NavigableString、BeautifulSoup、Comment
from bs4 import BeautifulSoup
f = open("./htmlDemo1.html","rb")
html = f.read().decode("utf-8")
bs = BeautifulSoup(html,"html.parser")
#print(bs.title)
# print(bs.head)
# print(bs.h1)
#print(type(bs.h1))
#1. Tag 标签及其内容 (只能拿到它所找到的第一个内容)
#print(bs.title.string)
#print(type(bs.title.string))
#2. NavigableString 标签里的内容(字符串)
# print(bs.a.attrs)
# print(bs.p.attrs) #获取标签的属性,并放在字典中
#print(type(bs))
#3. BeautifulSoup 表示整个文档
# print(bs.name)
# print(bs.attrs)
# print(bs)
# print(bs.p.string)
# print(type(bs.p.string))
#4. Comment 是一个特殊的NavigableString,输出的内容不包含注释符号
#文档的遍历
# print(bs.head.contents)
# print(bs.head.contents[1])
#文档的搜索
标签:python,爬虫,BeautifulSoup,bs4,html,bs,print,type
From: https://www.cnblogs.com/he-cheng/p/17148826.html