\n", data[0].page_content)
print(“\n=== 第二篇文章 ===”)
print("标题: ", data[1].metadata[‘title’])
print("链接: ", data[1].metadata[‘link’])
print(“内容:\n”, data[1].page_content)
#### 输出结果
```plaintext
=== 第一篇文章 ===
标题: Donald Trump indictment: What do we know about the six co-conspirators?
链接: https://www.bbc.com/news/world-us-canada-66388172
内容:
In testimony to the congressional committee examining the 6 January riot, Mrs Powell said she did not review all of the many claims of election fraud she made, telling them that "no reasonable person" would view her claims as fact. Neither she nor her representatives have commented.
=== 第二篇文章 ===
标题: Lizzo dancers Arianna Davis and Crystal Williams: 'No one speaks out, they are scared'
链接: https://www.bbc.com/news/entertainment-arts-66384971
内容:
Ms Williams added: "If there's anything that I can do in my power to ensure that dancers or singers or whoever decides to work with her don't have to go through that same experience, I'm going to do that."
3.3 启用 NLP 模块生成关键词和摘要
我们可以通过设置 nlp=True
,让 Loader 自动为每篇文章生成关键词和摘要内容。
示例代码
# 1. 重新初始化 Loader,启用 NLP 分析
loader = NewsURLLoader(urls=urls, nlp=True)
# 2. 加载文章并运行 NLP 分析
data = loader.load()
# 3. 获取关键词与摘要
print("=== 第一篇文章的 NLP 分析 ===")
print("关键词: ", data[0].metadata["keywords"])
print("摘要: ", data[0].metadata["summary"])
print("\n=== 第二篇文章的 NLP 分析 ===")
print("关键词: ", data[1].metadata["keywords"])
print("摘要: ", data[1].metadata["summary"])
输出结果
=== 第一篇文章的 NLP 分析 ===
关键词: ['powell', 'know', 'donald', 'trump', 'review', 'indictment', 'telling', 'view', 'reasonable', 'person', 'testimony', 'coconspirators', 'riot', 'representatives', 'claims']
摘要: In testimony to the congressional committee examining the 6 January riot, Mrs Powell said she did not review all of the many claims of election fraud she made, telling them that "no reasonable person" would view her claims as fact. Neither she nor her representatives have commented.
=== 第二篇文章的 NLP 分析 ===
关键词: ['davis', 'lizzo', 'singers', 'experience', 'crystal', 'ensure', 'arianna', 'theres', 'williams', 'power', 'going', 'dancers', 'im', 'speaks', 'work', 'ms', 'scared']
摘要: Ms Williams added: "If there's anything that I can do in my power to ensure that dancers or singers or whoever decides to work with her don't have to go through that same experience, I'm going to do that."
4. 应用场景分析
NewsURLLoader
适用于以下场景:
-
信息抽取与分析:
用于企业舆情监控、新闻热点提取、情绪分析等应用。 -
知识图谱构建:
抽取新闻中提到的重要实体与事件,建立语义网络。 -
内容聚合与推荐:
将多篇新闻数据格式化为可用文档,并提取核心信息用于推荐引擎。 -
摘要生成与自动化写作:
快速提取文章核心内容,结合生成式 AI 进一步生成高质量总结或评论。
5. 实践建议
- 过滤无效 URL:确保输入的 URL 有效且内容丰富,以减少抓取失败或内容质量不佳的情况。
- 合理设置 NLP 功能:如果只需要基本的内容抓取,可关闭
nlp
参数以提升加载速度。 - 定期更新文章源:对于持续性分析任务(如舆情监控),建议定期更新 URL 列表。
通过上述方法,你可以快速获取并分析新闻内容。如果你在实现过程中遇到问题,欢迎在评论区交流!
标签:NLP,her,title,she,print,data,metadata From: https://blog.csdn.net/awd5456aw/article/details/145076872