使用Python根据网页生成RSS

时间：2024-05-11 22:43:51浏览次数：25

标签：网页 title Python text item ET news items RSS

pip install requests beautifulsoup4 lxml

import requests  
from bs4 import BeautifulSoup  
import xml.etree.ElementTree as ET  
  
def fetch_news_from_url(url):  
    # 1. 网页抓取  
    response = requests.get(url)  
    response.raise_for_status()  # 确保请求成功  
    soup = BeautifulSoup(response.text, 'html.parser')  
  
    # 假设你已经知道如何提取新闻数据（这里只是示例）  
    news_items = []  
    for item in soup.select('.list li'):  # 假设新闻项有类名'.news-item'  
        title = item.select_one('a').text  # 假设标题有类名'.title'  
        link = item.select_one('a')['href']  # 假设链接在'.link a'中  
        # description = item.select_one('.description').text  # 假设描述有类名'.description'  
        # time = item.select_one('.time').text
        news_items.append({'title': title, 'link': link, 'description': ''})  
  
    return news_items  
  
def generate_rss(news_items, rss_filename):  
    root = ET.Element("rss")  
    root.set("version", "2.0")  
    channel = ET.SubElement(root, "channel")  
  
    for item in news_items:  
        item_elem = ET.SubElement(channel, "item")  
        ET.SubElement(item_elem, "title").text = item['title']  
        ET.SubElement(item_elem, "link").text = item['link']  
        ET.SubElement(item_elem, "description").text = item['description']  
  
    tree = ET.ElementTree(root)  
    tree.write(rss_filename, encoding='utf-8', xml_declaration=True)  
  
# 使用示例  
news_url = "https://gdstc.gd.gov.cn/zwgk_n/tzgg/index.html"  # 替换为实际的新闻网页URL  
news_items = fetch_news_from_url(news_url)  
generate_rss(news_items, "gdkxjsnews.rss")

标签：网页,title,Python,text,item,ET,news,items,RSS
From： https://www.cnblogs.com/tcli/p/18187289

python 换源命令
#换阿里云的源$pipconfigsetglobal.index-urlhttps://mirrors.aliyun.com/pypi/simple$pipconfigsetinstall.trusted-hostmirrors.aliyun.com#换清华大学的源$pipconfigsetglobal.index-urlhttps://pypi.tuna.tsinghua.edu.cn/simple$pipconfigsetinstall......
牛客小白月赛93(python)
A生不逢71defcheck(num):2return'7'instr(num)ornum%7==034defsolve():5n,a,k=LII()6d=a+17foriinrange(k):8ifcheck(d):9print('p',end='')10els......
Python环境变量设置与读取
★环境变量基本概念环境变量定义环境变量是操作系统中存储有关操作系统配置信息和应用程序运行环境的动态值的一种机制。环境变量的主要作用是为正在运行的进程提供配置信息，帮助程序找到所需的资源或者确定程序运行的方式。在操作系统中，每个进程都有自己的环境变量集合。......
最近在写一个网页,想谈谈数据表的关系
一对多影片(一)--剧集(多)影片表idurltitle1url1title12url2title23url3title3剧集表idmovie_idurl11url121url233url341url4在上面两个表中,可见一个影片可以有多个剧集,在表的设计中应该在多的一方设置一的一方......
python教程11-面向对象
python的面向对象和java有一些不一样：（java中，只有在类中定义的变量实例才能用，但是python更加灵活）类变量：类变量在整个实例化的对象中是公用的。类变量定义在类中且在函数体之外。类变量通常不作为实例变量使用。实例变量：在类的声明中，属性是用变量来表示的，这种变量就称为实例变量，实......
python利用魔塔大模型生成视频
安装依赖参考上篇文章https://www.cnblogs.com/qcy-blog/p/18186353新建main.pyfrommodelscope.pipelinesimportpipelinefrommodelscope.outputsimportOutputKeysp=pipeline('text-to-video-synthesis','damo/text-to-video-synthesis')test_text={......
python用魔塔大模型生成国画
模型地址https://www.modelscope.cn/models/langboat/Guohua-Diffusion/summary新建一个main.pyfrommodelscope.utils.constantimportTasksfrommodelscope.pipelinesimportpipelineimportcv2pipe=pipeline(task=Tasks.text_to_image_synthesis,......
PikaScript - 面向嵌入式的超轻量级python引擎+Ring-Buffer - 仅80行代码的超简洁环形
1、PikaScript-面向嵌入式的超轻量级python引擎PikaScript（前称mimiscript）是一个完全重写的超轻量级python引擎，零依赖，零配置，可以在少于4KB的RAM下运行(如stm32g030c8和stm32f103c8)，极易部署和扩展。项目地址：https://github.com/pikasTech/pikascriptPikaScript是使用c语言写......
Python-PostgreSQL主键自动填充报错：SAWarning: Column x is marked as a member of th
importdatetimefromsqlalchemyimportColumn,String,inspect,Integerfromsqlalchemy.ext.declarativeimportdeclarative_basefromsqlalchemy.ormimportsessionmakerfromsqlalchemyimportcreate_engineengine=create_engine(DATABASE_URL)Base=decla......
python-类型提示（type hinting）
类型提示在Python3.5及更高版本中引入，能够让代码更具可读性和可维护性，并帮助静态类型检查工具进行代码分析。以下是关于类型提示的一些详细介绍和示例：类型提示概述基本语法：函数参数类型提示：deffunction_name(param:type)->return_type:返回值类型提示：deffunction_na......

使用Python根据网页生成RSS

相关文章

赞助商

阅读排行