首页 > 其他分享 >js29

js29

时间:2023-12-12 22:12:04浏览次数:29  
标签:js29 text tds soup html print import

1、使用requests的get()函数访问必应网站20次,打印返回状态,text()内容,计算text属性和content属性所返回网页内容的长度。

代码如下:

import requests
for i in range(20):
r = requests.get('https://cn.bing.com', verify=False)
r.encoding="utf-8"
print(r.status_code)
print(r.text)
r.content
print(len(r.text))
print(len(r.content))

2、

a.打印head标签内容和学号后两位

b.获取body标签内容

c.获取id为first的标签对象

d.获取并打印html页面中的中文字符

代码如下:

import re
from bs4 import BeautifulSoup
file=open('D:/python/pythonfile/onepa.html','r').read()
soup=BeautifulSoup(file,'html.parser')
print(soup.head,'30')
print(soup.body)
print(soup.find_all(id="first"))
print(re.findall('[\u4e00-\u9fa5]+',soup.text))

3、从中国大学排名网站爬取2019年大学排名

代码如下:

import requests
from bs4 import BeautifulSoup
import bs4
def getHTMLText(url) :
try:
r = requests.get(url,timeout = 30)
r.raise_for_status()
r.encoding = r.apparent_encoding
return r.text
except:
print("获取失败")
return ''

def fillUnivlist(ulist,html):
soup = BeautifulSoup(html,"html.parser")
for tr in soup.find('tbody').children:
if isinstance(tr,bs4.element.Tag):
tds = tr('td')
ulist.append([tds[0].string,tds[1].string,tds[2].string,tds[3].string])
pass

def printUnivlist(ulist,num):
print("{:^4}\t{:<15}\t{:<8}\t{:<8}".format("排名","学校名称","省市","总分"))
for i in range(num):
u = ulist[i]
print("{:^4}\t{:<15}\t{:<8}\t{:<8}".format(u[0],u[1],u[2],u[3]))

def main(num):
allUniv=[]
url="https://www.shanghairanking.cn/rankings/bcur/201911.html"
html = getHTMLText(url)
fillUnivlist(allUniv,html)
printUnivlist(allUniv,num)

main(20)




标签:js29,text,tds,soup,html,print,import
From: https://www.cnblogs.com/xiaozhang-nulibanzhuan-ing/p/17897934.html

相关文章