jieba西游记

时间：2023-12-26 13:24:06浏览次数：38

import jieba

with open('E:\西游记.txt','r',encoding='utf-8')as f: # 打开文件
    txt = f.read()  # 读取为txt
    words = jieba.lcut(txt) # 利用jieba库的lcut分词
    counts={} # 创建字典
    for word in words: # 逐个遍历
        if len(word) == 1:   #  对于一些分词之后只有一个词的，还有一些只有单个的符号的，如。，!~，直接选择舍弃，只统计词组类型的
            continue
        else:
            #  创建字典，对应的值会进行累加
            counts[word]=counts.get(word,0)+1 # 此时词语出现次数累加 ，对每一个键对应的值
list = list(counts.items()) # 字典中items（）方法见下 ,函数返回列表类型，列表里面每一个元素是一个(键, 值) 元组数组。
# 从大到小进行排列 ，key对应的值为字典的[1]索引 = value,此时列表里面每一个元素是一个元组的形式
list.sort(key=lambda x:x[1],reverse=True)
for i in range(20):
    print("西游记出现第{}多的词语是{},出现的次数为{}".format(i+1,list[i][0],list[i][1]))

标签：jieba,word,list,counts,txt,西游记
From： https://www.cnblogs.com/lzk-95uu/p/17927924.html

jieba分词 | 西游记相关分词,出现次数最高的20个。
代码1importjieba23txt=open("《西游记》.txt","r",encoding='utf-8').read()45words=jieba.lcut(txt)#使用精确模式对文本进行分词67counts={}#通过键值对的形式存储词语及其出现的次数89forwordinwords:10iflen(word)==......
jieba 分词
尾号为7,8,9，0的同学做，聊斋相关的分词，出现次数最高的20个。#-*-coding:utf-8-*-"""CreatedonSatDec2318:00:492023@author:86135"""importjieba#读取文本文件path="C:\\Users\\86135\\Desktop\\聊斋.txt"file=open(path,&q......
jieba分词
importjiebatxt=open("D:\python-learn\lianxi\聊斋志异.txt","r",encoding='utf-8').read()words=jieba.lcut(txt)counts={}forwordinwords:iflen(word)==1:continueelse:counts[word]=count......
jieba分词
importjieba#读取文本文件path="红楼梦.txt"file=open(path,"r",encoding="GB2312",errors="ignore")text=file.read()file.close()#使用jieba分词words=jieba.lcut(text)#统计词频counts={}forwordinwords:#过滤掉长度为1的词语iflen......
jieba分词
importjiebatxt=open("D:\\python\\西游记.txt","r",encoding='ansi').read()words=jieba.lcut(txt)#使用精确模式对文本进行分词counts={}#通过键值对的形式存储词语及其出现的次数forwordinwords:iflen(word)==1:continueelifword......
jieba分词《聊斋》
importjiebatxt=open("聊斋志异白话简写版.txt","r",encoding='utf-8').read()words=jieba.lcut(txt)#使用精确模式对文本进行分词counts={}#通过键值对的形式存储词语及其出现的次数forwordinwords:iflen(word)==1:continueelif......
jieba分词——西游记相关的分词，出现次数最高的20个
1importjieba23txt=open("D:\Pythonproject\Python123作业\西游记.txt","r",encoding='utf-8').read()4words=jieba.lcut(txt)#使用精确模式对文本进行分词5counts={}#通过键值对的形式存储词语及其出现的次数67forwordinwords:......
jieba 分词
西游记相关的分词，出现次数最高的20个输入：1importjieba2excludes={"一个","我们","怎么","那里","不知","不是","只见","两个","不敢","这个","如何","原来","甚......
jieba分词
尾号为1,2,3的同学做，西游记相关的分词，出现次数最高的20个。```importjieba#读取文本文件path="西游记.txt"file=open(path,"r",encoding="utf-8")text=file.read()file.close()#使用jieba分词words=jieba.lcut(text)#统计词频counts={}forwordin......
jieba 分词
描述尾号为1,2,3的同学做，西游记相关的分词，出现次数最高的20个。‪‬‪‬‪‬‪‬‪‬‮‬‪‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪......

jieba西游记

相关文章

赞助商

阅读排行