首页 > 其他分享 >西游记jieba分词统计

西游记jieba分词统计

时间:2023-12-29 10:46:06浏览次数:31  
标签:jieba rword word items counts txt 西游记 分词

import jieba

排除非人名

excludes = {"一个","那里","怎么","我们","不知","和尚","妖精","两个","甚么","不是",
"只见","国王","徒弟","呆子","如何","这个","大王","原来","不敢","不曾",
"闻言","正是","只是","那怪","出来","一声","真个","小妖" }
txt = open("西游记.txt","r",encoding='gb18030').read()

对文本进行分词

words = jieba.lcut(txt)

创建统计用字典

counts = {}
for word in words:
if len(word) == 1:
continue
elif word == "老孙" or word == "大圣" or word =="悟空":
rword = "行者"
elif word == "师父" or word == "三藏" or word =="长老":
rword = "唐僧"
else:
rword = word
counts[rword] = counts.get(rword, 0) + 1

把排除序列去除

for word in excludes:
del counts[word]
items = list(counts.items())

按照从大到小排序

items.sort(key=lambda x:x[1], reverse=True)
for i in range(10):
word, count = items[i]
print("{0:<10}{1:>5}".format(word, count))

标签:jieba,rword,word,items,counts,txt,西游记,分词
From: https://www.cnblogs.com/ldyyby678/p/17934251.html

相关文章

  • jieba分词
    importjiebapath="all.txt"#读取文本文件file=open(path,"r",encoding="utf-8")text=file.read()file.close()words=jieba.lcut(text)#使用jieba分词counts={}#统计词频forwordinwords:iflen(word)==1:#过滤掉长度为1的词语......
  • jieba分词-红楼梦-次数前20
    importjieba读取文本文件path="红楼梦.txt"file=open(path,"r",encoding="utf-8")text=file.read()file.close()使用jieba分词words=jieba.lcut(text)统计词频counts={}forwordinwords:#过滤掉长度为1的词语iflen(word)==1:continue#更......
  • jieba库
    ```importjieba#读取文本文件path="红楼梦.txt"file=open(path,"r",encoding="utf-8")text=file.read()file.close()#使用jieba分词words=jieba.lcut(text)#统计词频counts={}forwordinwords:#过滤掉长度为1的词语iflen(word)==1:......
  • 红楼梦jieba 分词
    importjiebatxt=open("D:\pycharm\python123\jieba分词作业\红楼梦.txt","r",encoding='utf-8').read()words=jieba.lcut(txt)#精确模式进行分词count={}#创建空字典forwordinwords:iflen(w......
  • 西游记jieba分词
    importjiebatxt=open("西游记.txt","r",encoding='utf-8').read()words=jieba.lcut(txt)#使用精确模式对文本进行分词counts={}#通过键值对的形式存储词语及其出现的次数forwordinwords:iflen(word)==1:continueelifwordin......
  • 分词
    importjiebatxt=open("红楼梦.txt","r",encoding='UTF-8').read()words=jieba.lcut(txt)count={}forwordinwords:  iflen(word)==1:    continue  else:    count[word]=count.get(word,0)+1    cut=[......
  • jieba 分词-红楼梦
    importjiebaexcludes={"什么","一个","我们","那里","你们","如今","说道","知道","起来","姑娘","这里","出来","他们","众人","自己",......
  • jieba 分词
    jieba分词:importjiebawithopen("C:\\Users\\86133\\Desktop\\liaozhai.txt","r",encoding='utf_8')asf:words=jieba.lcut(f.read())counts={}forwordinwords:iflen(word)==1:continueeli......
  • 聊斋jieba库
    importjiebaprint("0217向悦")#读取文本文件path="聊斋志异.txt"file=open(path,"r",encoding="utf-8")text=file.read()file.close()#使用jieba分词words=jieba.lcut(text)#统计词频counts={}forwordinwords:#过滤掉长度为1的词语......
  • 红楼梦jieba分词
    importjiebaexcludes={"什么","一个","我们","那里","你们","如今","说道","知道","起来","姑娘","这里","出来","他们","众人","自己",&quo......