首页 > 其他分享 >“, data[0].metadata[‘title‘])

“, data[0].metadata[‘title‘])

时间:2025-01-11 14:32:38浏览次数:3  
标签:NLP her title she print data metadata

\n", data[0].page_content)

print(“\n=== 第二篇文章 ===”)
print("标题: ", data[1].metadata[‘title’])
print("链接: ", data[1].metadata[‘link’])
print(“内容:\n”, data[1].page_content)

#### 输出结果
=== 第一篇文章 ===
标题:  Donald Trump indictment: What do we know about the six co-conspirators?
链接:  https://www.bbc.com/news/world-us-canada-66388172
In testimony to the congressional committee examining the 6 January riot, Mrs Powell said she did not review all of the many claims of election fraud she made, telling them that "no reasonable person" would view her claims as fact. Neither she nor her representatives have commented.

=== 第二篇文章 ===
标题:  Lizzo dancers Arianna Davis and Crystal Williams: 'No one speaks out, they are scared'
链接:  https://www.bbc.com/news/entertainment-arts-66384971
Ms Williams added: "If there's anything that I can do in my power to ensure that dancers or singers or whoever decides to work with her don't have to go through that same experience, I'm going to do that."

3.3 启用 NLP 模块生成关键词和摘要

我们可以通过设置 nlp=True,让 Loader 自动为每篇文章生成关键词和摘要内容。

# 1. 重新初始化 Loader,启用 NLP 分析
loader = NewsURLLoader(urls=urls, nlp=True)

# 2. 加载文章并运行 NLP 分析
data = loader.load()

# 3. 获取关键词与摘要
print("=== 第一篇文章的 NLP 分析 ===")
print("关键词: ", data[0].metadata["keywords"])
print("摘要: ", data[0].metadata["summary"])

print("\n=== 第二篇文章的 NLP 分析 ===")
print("关键词: ", data[1].metadata["keywords"])
print("摘要: ", data[1].metadata["summary"])
=== 第一篇文章的 NLP 分析 ===
关键词:  ['powell', 'know', 'donald', 'trump', 'review', 'indictment', 'telling', 'view', 'reasonable', 'person', 'testimony', 'coconspirators', 'riot', 'representatives', 'claims']
摘要:  In testimony to the congressional committee examining the 6 January riot, Mrs Powell said she did not review all of the many claims of election fraud she made, telling them that "no reasonable person" would view her claims as fact. Neither she nor her representatives have commented.

=== 第二篇文章的 NLP 分析 ===
关键词:  ['davis', 'lizzo', 'singers', 'experience', 'crystal', 'ensure', 'arianna', 'theres', 'williams', 'power', 'going', 'dancers', 'im', 'speaks', 'work', 'ms', 'scared']
摘要:  Ms Williams added: "If there's anything that I can do in my power to ensure that dancers or singers or whoever decides to work with her don't have to go through that same experience, I'm going to do that."

4. 应用场景分析

NewsURLLoader 适用于以下场景:

  1. 信息抽取与分析

  2. 知识图谱构建

  3. 内容聚合与推荐

  4. 摘要生成与自动化写作
    快速提取文章核心内容,结合生成式 AI 进一步生成高质量总结或评论。

5. 实践建议

  • 过滤无效 URL:确保输入的 URL 有效且内容丰富,以减少抓取失败或内容质量不佳的情况。
  • 合理设置 NLP 功能:如果只需要基本的内容抓取,可关闭 nlp 参数以提升加载速度。
  • 定期更新文章源:对于持续性分析任务(如舆情监控),建议定期更新 URL 列表。


From: https://blog.csdn.net/awd5456aw/article/details/145076872


  • 【论文阅读】Integrating single-cell multi-omics data through self-supervised clu
  • WPF 怎么利用behavior优雅的给一个Datagrid添加一个全选的功能
  • DataGrip的数据库驱动的离线安装
  • SQLSER中使用DATALENGTH 函数返回字符串的字节长度
  • 在Vue 3中创建和使用FormData对象
    在Vue3中创建和使用FormData对象的具体步骤如下‌:‌创建FormData对象‌:在Vue组件中,首先需要创建一个新的FormData对象。FormData是一个内置的JavaScript对象,用于构建可以通过XMLHttpRequest或fetch提交的表单数据。可以通过以下方式创建:letformData=newFormData(); ......
  • 数据集-目标检测系列- 收割机 测数据集 harvesters >> DataBall
    数据集-目标检测系列-收割机测数据集harvesters>>DataBallDataBall助力快速掌握数据集的信息和使用方式,会员享有百种数据集,持续增加中。 需要更多数据资源和技术解决方案,知识星球:“DataBall-X数据球(free)”贵在坚持!数据样例项目地址:*相关项目1)数据集可......
  • 组会PPT_Learning Representations from Imperfect Time Series Data via Tensor Rank
  • 【YashanDB知识库】进行load data的时候报找不到动态库liblz4.so
  • INTO TABLE @DATA内表与定义内表的区别
    1、两者的区别DATA定义的内表,会将内表中的字段作为关键组件。 而通过SELECT查询时用INTO TABLE @DATA产生的内表,没有对应的关键组件 两者的区别2、影响因为新语法产生的内表没有关键组件,所以在LOOP或READ中不能使用DELETE TABLEtab FROM wa_tab.语句进行删除 ......
  • 低功耗蓝牙芯片CH57x,CH58x,CH59x回读codeflash及dataflash数据