题目:matplotlib库的网页已下载并保存于“matplotlib · PyPI.html”文件(utf-8格式)中,请爬取并显示第i个版本的信息,并显示文件名(*.whl)、文件大小、更新时间、适用版本。i由键盘输入。其中第1版本显示如包过下图:
(图片没有)!
对应脚本如下:
<div class="card file__card"> <a href="https://files.pythonhosted.org/packages/80/24/97c9bb03263d0812ebc17ad0608a4b9f2dda4d53ec21bd7534a932809f30/matplotlib-3.6.2-pp39-pypy39_pp73-win_amd64.whl"> matplotlib-3.6.2-pp39-pypy39_pp73-win_amd64.whl </a> (7.2 MB <a href="#copy-hash-modal-f0c610ff-69c1-4f6d-85ae-806e07cfe734">view hashes</a>) <p class="file__meta"> Uploaded <time data-controller="localized-time" data-localized-time-relative="true" data-localized-time-show-time="false" datetime="2022-11-03T01:52:11+0000"> Nov 3, 2022 </time> <code>pp39</code> </p> </div>
说明:第三方库BeautifulSoup已自动下载考试系统文件夹,不用安装。
运行后若输入:1
则结果输出:
matplotlib-3.6.2-pp39-pypy39_pp73-win_amd64.whl;7.2MB;UploadedNov3,2022 pp39;
运行后若输入:2
则结果输出:
matplotlib-3.6.2-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl;7.4MB;UploadedNov3,2022 pp39;
请在"""【"""和"""】"""之间的空白处填入适当语句或式子。
"""源程序"""
#coding=gbk from bs4 import BeautifulSoup url = 'matplotlib · PyPI.html'
"""【"""
"""】"""
"""End"""
答案示例(仅供参考):
import re # 引入正则表达式库 from datetime import datetime with open(url, 'r', encoding='utf-8') as file: soup = BeautifulSoup(file, 'html.parser') i = int(input()) cards = soup.find_all('div', class_='card file__card') if i > len(cards) or i < 1: print("输入的索引超出范围") else: card = cards[i] a_tag = card.find('a') file_name = a_tag.text.strip() file_size = a_tag.next_sibling.strip() file_size = file_size[1:].strip().replace(" ", "") file_meta = card.find('p', class_='file__meta') uploaded_time = file_meta.find('time')['datetime'] version = file_meta.find('code').text.strip() try: dt = datetime.strptime(uploaded_time, "%Y-%m-%dT%H:%M:%S%z") uploaded_time = f"{dt:%b}{dt.day},{dt.year}" except ValueError as e: print(f"时间格式错误: {e}") uploaded_time = "未知时间" print(f"{file_name};{file_size};Uploaded{uploaded_time} {version};")
标签:分析,uploaded,pp39,matplotlib,爬取,file,time,网页,find From: https://www.cnblogs.com/Cyruswong/p/18433656