python 一个能统计txt文件的每个单词出现次数的python程序; 统计txt文件里每个汉字出现次数

时间：2024-01-30 19:12:55浏览次数：49

标签：word python 次数 rate file line txt

输出的时候排列顺序是从多到少，需要去除txt中的特殊符号，注意是英文txt文件

用来自制词汇包的hhh

import re
from collections import Counter

def count_words(file_path):
# 读取文本文件内容
with open(file_path, 'r', encoding='utf-8') as file:
text = file.read()

# 去除特殊符号，并将文本转换为小写
cleaned_text = re.sub(r'[^a-zA-Z\s]', '', text).lower()

# 使用 Counter 统计单词出现次数
word_counts = Counter(cleaned_text.split())

# 按照出现次数从多到少排序
sorted_word_counts = sorted(word_counts.items(), key=lambda x: x[1], reverse=True)

return sorted_word_counts

def main():
file_path = 'your_text_file.txt' # 替换为你的文件路径
word_counts = count_words(file_path)

# 输出结果
for word, count in word_counts:
print(f'{word}: {count}')

if __name__ == "__main__":
main()

汉字出现次数

Python怎么读取txt文件内容，并统计每个词重复出现的次数？ - 知乎 (zhihu.com)

# !!!!汉字统计出现次数!!!!
# 打开文件
fr = open('txtBooks/hp词汇中英.txt', 'r', encoding='utf-8')

# 把f=open('./dat文件/song.dat',encoding='utf-8')这句改成
# f=open('./dat文件/song.dat',encoding='gbk')
# f=open('./dat文件/song 1.dat',encoding='gb18030')

# 读取文件所有行
content = fr.readlines()
contentLines = ''

characers = []  # 存放不同字的总数
rate = {}  # 存放每个字出现的频率

# 依次迭代所有行
for line in content:
    # 去除空格
    line = line.strip()
    # 如果是空行，则跳过
    if len(line) == 0:
        continue
    contentLines = contentLines + line
    # 统计每一字出现的个数
    for x in range(0, len(line)):
        # 如果字符第一次出现 加入到字符数组中
        if not line[x] in characers:
            characers.append(line[x])
        # 如果是字符第一次出现 加入到字典中
        if line[x] not in rate:
            rate[line[x]] = 1
        # 出现次数加一
        rate[line[x]] += 1

# 对字典进行倒数排序 从高到低 其中e表示dict.items()中的一个元素，
# e[1]则表示按 值排序如果把e[1]改成e[0]，那么则是按键排序，
# reverse=False可以省略，默认为升序排列
rate = sorted(rate.items(), key=lambda e: e[1], reverse=True)

print('全文共有%d个字' % len(contentLines))
print('一共有%d个不同的字' % len(characers))
print()
for i in rate:
    print("[", i[0], "] 共出现 ", i[1], "次")

fr.close()

标签：word,python,次数,rate,file,line,txt
From： https://www.cnblogs.com/jane656/p/17997768

python自定义装饰器
被装饰函数带参数或不带参数#coding=utf8#自定义装饰器函数，需使用嵌套函数importtimedefdecorator_foo(func):definner_func(*args,**kwargs):start_time=time.time()func(*args,**kwargs)print('runfunccost{}s'.format(time.......
python自定义装饰器
被装饰函数带参数或不带参数#coding=utf8#自定义装饰器函数，需使用嵌套函数importtimedefdecorator_foo(func):definner_func(*args,**kwargs):start_time=time.time()func(*args,**kwargs)print('runfunccost{}s'.format(time.......
Python中HTTPException（基于werkzeug.exceptions包）
当我们在开发HTTP服务时（接口服务），由于很多从内部引发的Python异常，会触发标准HTTP非200响应的视图。为了让前端有着更好的视图体验（如果因为内部异常，会返回给前端/调用方更好的一个页面/返回）。对于我们来说，给予调用方一个固定的返回格式时非常重要的（因此通过HTTPException......
python之常用标准库-log
log级别：debug(),info(),warning(),error()andcritical()5个级别，最低为debug,最高位criticallog标准输出格式：%(name)sLogger的名字%(levelno)s数字形式的日志级别%(levelname)s文本形式的日志级别%(pathname)s调用日志输出函数的模块的完整......
Python 在Windows上监控Linux日志
importparamikoimporttimedefmonitor_linux_log(linux_ip,username,password,log_file):client=paramiko.SSHClient()client.set_missing_host_key_policy(paramiko.AutoAddPolicy())client.connect(linux_ip,username=username,password=passwo......
[-001-]-Python语言的GUI编程工具包之PyQt5初步认识
一、PyQt5的QtWidgets介绍PyQt5的QtWidgets模块包含了很多类，用于创建GUI应用程序的各种控件和窗口部件。其中一些主要的类包括：QApplication：应用程序类，负责管理应用程序的控制流程和事件循环QMainWindow：主窗口类，提供了一个应用程序的主界面QWidget：窗口部件类，是所有用户界面......
windows上使用python2.7获取svn info，中文路径乱码问题
#-*-coding:utf-8-*-from__future__importunicode_literalsimportsubprocessimportcmdimportosos.environ['LANG']='en_US.UTF-8'classSVNCommand(cmd.Cmd):defdo_svninfo(self,folder_path):#构建svninfo命令......
100个python模块
1.NumPy-数值计算扩展库。提供高效的多维数组对象和用于处理这些数组的工具。http://www.numpy.org/2.SciPy-科学计算库。构建在NumPy之上,用于科学与技术计算。https://www.scipy.org/3.Pandas-数据分析与操作库。提供高性能易用的数据结构和数据分析工具。http://panda......
Python正则表达式之： (?P<name>…)
importres="2023-12-2314:34:56|liming|20230789"parren="(?P<time>^\d+-\d+-\d+\s\d+:\d+:\d+)\|(?P<name>[\w]+)\|(?P<number>\d+)$"g=re.search(parren,s)>>>g.groupdict(){'time':'2......
python获取表格数据总结
获取表格内容：图片中首先import了两个模块，一个os一个openyxl，然后指定表格路径，打开表格。os：这里os在Python中，os.chdir()方法用于改变当前的工作目录。工作目录是指当前正在执行的脚本所在的目录。通过使用os.chdir()方法，我们可以在脚本执行过程中切换到不同的目录。openy......

python 一个能统计txt文件的每个单词出现次数的python程序; 统计txt文件里每个汉字出现次数

相关文章

赞助商

阅读排行