首页 > 其他分享 >西游记jieba分词

西游记jieba分词

时间:2023-12-26 20:14:52浏览次数:29  
标签:jieba rword word items elif counts 西游记 分词

引用jiaba库

点击查看代码
import jieba

读取文件,文件路径填写文件放取的位置
并且使用jieba分词的精确模式

点击查看代码
txt = open('西游记.txt', 'r', encoding='utf-8').read()
words = jieba.lcut(txt)
count = {} #通过键值对的形式存储词语及其出现的次数

将同一人物的不同说法统一,遍历所有词语,每出现一次其对应值加一

点击查看代码
for word in words:
    if len(word) == 1:
        continue
    if word in ['孙猴子','孙行者','孙悟空','斗战胜佛','齐天大圣','行者','老孙','大圣','孙大圣','悟空']:
        rword = '孙悟空'
    elif word == ['唐僧','唐三藏','金蝉子','师父']:
        rword = '唐僧'
    elif word == ['猪八戒','猪悟能','天蓬元帅','悟能']:
        rword = '猪八戒'
    elif word == ['沙僧','沙悟净','悟净']:
        rword = '沙僧'
    elif word == ['如来佛祖','如来']:
        rword = '如来佛祖'
    else:
        rword = word
    counts[rword] = counts.get(rword,0) + 1

对出现的词语次数进行排序
并打印出来

点击查看代码
items = list(counts.items())
items.sort(key = lambda x:x[1], reverse = True)
for i in range(len(items)):
    word, count = items[i]
    print('{0:<10}{1:>5}'.format(word, count))

标签:jieba,rword,word,items,elif,counts,西游记,分词
From: https://www.cnblogs.com/88888888b/p/17929225.html

相关文章

  • jieba西游记
    importjiebawithopen('E:\西游记.txt','r',encoding='utf-8')asf:#打开文件txt=f.read()#读取为txtwords=jieba.lcut(txt)#利用jieba库的lcut分词counts={}#创建字典forwordinwords:#逐个遍历iflen(word)==1:#对于......
  • jieba分词 | 西游记相关分词,出现次数最高的20个。
    代码1importjieba23txt=open("《西游记》.txt","r",encoding='utf-8').read()45words=jieba.lcut(txt)#使用精确模式对文本进行分词67counts={}#通过键值对的形式存储词语及其出现的次数89forwordinwords:10iflen(word)==......
  • jieba 分词
    尾号为7,8,9,0的同学做,聊斋相关的分词,出现次数最高的20个。#-*-coding:utf-8-*-"""CreatedonSatDec2318:00:492023@author:86135"""importjieba#读取文本文件path="C:\\Users\\86135\\Desktop\\聊斋.txt"file=open(path,&q......
  • jieba分词
    importjiebatxt=open("D:\python-learn\lianxi\聊斋志异.txt","r",encoding='utf-8').read()words=jieba.lcut(txt)counts={}forwordinwords:iflen(word)==1:continueelse:counts[word]=count......
  • jieba分词
    importjieba#读取文本文件path="红楼梦.txt"file=open(path,"r",encoding="GB2312",errors="ignore")text=file.read()file.close()#使用jieba分词words=jieba.lcut(text)#统计词频counts={}forwordinwords:#过滤掉长度为1的词语iflen......
  • jieba分词
     importjiebatxt=open("D:\\python\\西游记.txt","r",encoding='ansi').read()words=jieba.lcut(txt)#使用精确模式对文本进行分词counts={}#通过键值对的形式存储词语及其出现的次数forwordinwords:iflen(word)==1:continueelifword......
  • jieba分词《聊斋》
    importjiebatxt=open("聊斋志异白话简写版.txt","r",encoding='utf-8').read()words=jieba.lcut(txt)#使用精确模式对文本进行分词counts={}#通过键值对的形式存储词语及其出现的次数forwordinwords:iflen(word)==1:continueelif......
  • jieba分词——西游记相关的分词,出现次数最高的20个
    1importjieba23txt=open("D:\Pythonproject\Python123作业\西游记.txt","r",encoding='utf-8').read()4words=jieba.lcut(txt)#使用精确模式对文本进行分词5counts={}#通过键值对的形式存储词语及其出现的次数67forwordinwords:......
  • jieba 分词
    西游记相关的分词,出现次数最高的20个输入:1importjieba2excludes={"一个","我们","怎么","那里","不知","不是","只见","两个","不敢","这个","如何","原来","甚......
  • jieba分词
    尾号为1,2,3的同学做,西游记相关的分词,出现次数最高的20个。```importjieba#读取文本文件path="西游记.txt"file=open(path,"r",encoding="utf-8")text=file.read()file.close()#使用jieba分词words=jieba.lcut(text)#统计词频counts={}forwordin......