首页 > 编程语言 >自然语言处理:Python的spaCy库及文章人名统计

自然语言处理:Python的spaCy库及文章人名统计

时间:2024-04-04 15:29:05浏览次数:21  
标签:Maplewood Python doc community 库及 ent James spaCy Liz

在不断发展的自然语言处理领域中,Python的spaCy库以其强大和用户友好的特性脱颖而出。本学习笔记深入探讨利用spaCy进行基本NLP任务,包括分词、句子切分、词性标注、命名实体识别,以及一个实际应用示例——识别文本中的人名。

安装spaCy库

spaCy · Industrial-strength Natural Language Processing in Python

点击USAGE,根据配置选择命令,在anaconda命令行里面安装即可。

基本功能

import spacy
from spacy import displacy
from collections import Counter

nlp = spacy.load("en_core_web_sm")

doc = nlp("Only when you understand the true meaning of life can you live truly. Bittersweet as life is, it is still wonderful, and it's fascinating even in tragedy.")

# 分词
for token in doc:
    print(token)
# 分句
for sent in doc.sents:
    print(sent)
# 词性
for token in doc:
    print('{} - {}'.format(token, token.pos_))

分句按照句号分离句子,词性在分词的基础上给每个词标注词性,如:

Only - ADV
when - SCONJ
you - PRON
understand - VERB
the - DET

命名体识别

常见的命名实体类型:

PERSON: 人物名称,如 "Alan"。

ORG: 组织名称,包括公司、政府机构、非政府组织等,如 "Wuhan University"。

GPE: 地缘政治实体,如国家、城市、州等,例如 "China"、"New York"。

DATE: 日期或时间段,例如 "today"、"1992"、"20th century"。

TIME: 时间,指一天中的时间点或持续时间,例如 "8:00 AM"、"two hours"。

# 命名体识别
doc_2 = nlp("Alan went to Wuhan University today")
for ent in doc_2.ents:
    print('{} - {}'.format(ent, ent.label_))
displacy.render(doc_2, style='ent', jupyter = True)

输出结果: 

文章人名统计

(用到的文本放在文章结尾)

def read_file(file_name):
    with open(file_name, 'r', encoding='utf-8') as file:
        return file.read()

text = read_file('text04.txt')
prd_text = nlp(text)
sentences = [s for s in prd_text.sents]
print(len(sentences))   # 26

def find_person(doc):
    c = Counter()
    for ent in prd_text.ents:
        if ent.label_ == 'PERSON':
            c[ent.lemma_]+=1
    return c
print(find_person(prd_text))

输出结果:

Counter({'James': 5, 'Liz': 3, 'Mary Jenkins': 2, 'Jessica Morales': 2, 'GW': 2, 'James Thompson': 1, 'Elizabeth "Liz" Harper': 1, 'Gazette': 1, 'Aaron Lee': 1, 'Lee': 1, 'Sarah Nguyen': 1, 'Ethan Smith': 1, 'George Washington': 1, 'Amelia Richardson': 1, 'Richard': 1, 'Maplewood': 1, 'Mary': 1, 'Aaron': 1, 'Jessica': 1, 'Sarah': 1, 'Ethan': 1, 'Amelia': 1})

附录

In the quaint town of Maplewood, nestled in the heart of America, the winds of change began to stir with the return of James Thompson, a decorated veteran, to his once vibrant hometown. Maplewood, known for its annual apple festival and community spirit, had seen better days. The local park, where James had spent countless childhood hours, was now neglected, and Main Street's bustling shops were struggling to stay afloat.

Determined to restore Maplewood to its former glory, James reached out to his childhood friend, Elizabeth “Liz” Harper, now the editor of the Maplewood Gazette, to share his vision. Liz, always a champion for local causes, saw an opportunity to rally the community and offered the Gazette as a platform to spark interest in James's project.

Together, they organized a town hall meeting, inviting not just long-time residents like the ever-energetic Mary Jenkins, who ran the local bakery, but also newcomers like Dr. Aaron Lee, a young physician passionate about public health, and Jessica Morales, a tech entrepreneur interested in sustainable living.

The meeting was a turning point. Inspired by James’s passion, Liz’s enthusiasm, and the shared love for Maplewood, the attendees brainstormed a series of initiatives. Mary Jenkins proposed a “Clean and Green” weekend, rallying volunteers to beautify the park and plant trees. Dr. Lee suggested free health screenings at these events, emphasizing the importance of health in community well-being. Jessica Morales, seeing the potential for technology to enhance community engagement, offered to create a mobile app to keep residents informed and involved in local events.

The ripple effect was immediate. High school students, led by the ambitious Sarah Nguyen and tech-savvy Ethan Smith, formed a “Youth for Maplewood” group, organizing fundraisers and social media campaigns to support the initiatives. Local historian and retired teacher, Mr. George Washington Carver (affectionately known as “Mr. GW”), offered history walks, sharing stories of Maplewood’s heritage, further instilling a sense of pride and belonging among the residents.

One of the most touching contributions came from Mrs. Amelia Richardson, a widow who donated her late husband’s collection of historical photographs of Maplewood for a public exhibition. “Richard loved this town as much as anyone,” she said, tears glistening in her eyes. “I know he’d be proud to see us all coming together like this.”

As months passed, Maplewood transformed. The park was no longer a place to avoid but a community hub, vibrant with laughter and activities. Main Street thrived as locals and visitors alike were drawn to its renewed energy and charm. The “Clean and Green” weekends became a beloved tradition, symbolizing the community’s commitment to their town and to each other.

Reflecting on the journey, James remarked, “I came back hoping to find the Maplewood I remembered. What I found was something even better – a community ready to fight for its future. It’s been an honor to stand alongside Liz, Mary, Aaron, Jessica, Sarah, Ethan, Mr. GW, Amelia, and every single person who believed in what we could achieve together.”

The story of Maplewood’s revival serves as a beacon of hope, a testament to the power of community when hearts and hands come together for a common cause. In Maplewood, the spirit of unity, fueled by the dedication of its residents, turned the tide, proving that even small towns could achieve big dreams.

标签:Maplewood,Python,doc,community,库及,ent,James,spaCy,Liz
From: https://blog.csdn.net/HUSTGO/article/details/137375028

相关文章

  • 基于python的豆瓣电影数据的可视化与分析
    1项目背景意义介绍    电影是一种具有极高娱乐性和文化价值的艺术形式,自从电影产业诞生以来,已经成为了人们生活中的重要组成部分。电影产业在全球范围内都有着广泛的影响力,对经济、文化、社会等多个方面都起到了积极的作用。因此,对电影产业进行数据分析和可视化,可以帮......
  • 利用python 实现微信自动回复
    全是干货,上代码#!/usr/bin/python3#-*-coding:utf-8-*-importpandasaspdimportnumpyasnpfromuiautomationimportWindowControl,MenuControl#绑定微信主窗口wx=WindowControl(Name='微信',#searchDepth=1)#切换窗口wx.SwitchToThi......
  • 【数据库】主流数据库及其常用工具简单科普
    主流数据库及其常用工具数据库分类关系型数据库(RDBMS)非关系型数据库(NoSQL)混合型数据库(HybridDatabases)对象关系数据库(ORDBMS)多维数据库(MultidimensionalDatabase)内存数据库(In-MemoryDatabase)主流数据库及其常用工具OracleMySQLMicrosoftSQLServerPostgreSQLMongoDB......
  • 探索Anaconda:创建Python虚拟环境
    目录 1.创建虚拟环境2.激活虚拟环境3.退出虚拟环境:4.常用命令4.1安装(使用pip或者conda都行,下面展示conda)4.2查看已安装的包4.3更新包4.4删除虚拟环境 1.创建虚拟环境打开AnacondaPrompt(或者终端),使用以下命令创建一个名为myenv的Python虚拟环境:conda......
  • django基于python的学生选课成绩信息管理系统7s7c8
    随着国内外教育事业的不断发展,加快教育信息化建设已成为我国教育事业改革与发展的必然选择。我国高校招生规模不断扩大,大量的学生信息管理就成了一个非常棘手的问题。依靠传统模式的利用人工进行学生的信息管理,费时费力,严重影响了教师的工作效率。而基于网络化的学生信息管理平......
  • python中小学教学一体化管理系统django-pycharm毕业设计
    根据近年来学校的发展情况,结合文献资料,对槐荫中学教学管理的信息化;至此,开发具有一定的技术可行性和安全性。该系统的核心内容是对首页、个人中心、学生管理、教师管理、教学计划管理、授课信息管理、培养计划管理、学生评价管理、在线考试管理、试题内容管理、系统管理、考试......
  • 【python学习过程--day1】认识python及其开发工具:VScode和pycharm的安装和激活
    认识python        Python是一种高级、通用、解释型编程语言,由GuidovanRossum在1980年代末和1990年代初设计开发的。它具有简洁清晰的语法和强大的标准库,因此被广泛用于Web开发、科学计算、人工智能、数据分析、系统自动化等领域。Python的设计哲学强调代码的可读性......
  • Python爬虫如何快速入门
    写了几篇网络爬虫的博文后,有网友留言问Python爬虫如何入门?今天就来了解一下什么是爬虫,如何快速的上手Python爬虫。一、什么是网络爬虫网络爬虫,英文名称为WebCrawler或Spider,是一种通过程序在互联网上自动获取信息的技术。它根据指定的规则,从互联网上下载网页、图片、视......
  • 每日面经分享(python进阶 part2)
    Python中的装饰器和上下文管理器区别是什么?它们分别适用于哪些场景?a.装饰器用于在函数或类的外部添加额外功能,而上下文管理器用于管理资源的获取和释放。b.装饰器是一种用于修改函数或类行为的技术。适用于需要在函数或类的外部添加额外功能的场景,比如日志记录、性能监......
  • 量化交易入门(四十一)ASI指标Python实现和回测
    老规矩先上图,看看ASI指标使用苹果数据回测后的结果如何。一、策略运行结果执行的结果:StartingPortfolioValue:100000.00FinalPortfolioValue:92514.82AnnualizedReturn:-1.93%SharpeRatio:-0.27MaxDrawdown:25.34%MaxDrawdownPeriod:441唉,好像亏钱了......