首页 > 编程语言 >【Python】72行实现代码行数统计,简单实用!

【Python】72行实现代码行数统计,简单实用!

时间:2024-09-08 21:17:01浏览次数:13  
标签:suffix Python self filename 实用 72 ._ print size

0. 前言

最近突然想知道自己总共写了多少行代码,于是做了这样一个小工具……

1. 准备工作

先考虑一下希望得到的效果:

Language(语言) Lines(代码行数) Size(代码文件总大小) Files(代码文件总数)
A 12345 300 KB 193
B 2345 165 KB 98
如上,程序输出一个表格,将代码行数作为关键字排序。
代码框架:
# -*- encoding: utf-8 -*-
import ...

# 代码行数计数类
class CodeLinesCounter(object):
    SIZES = [('B', 1), ('KB', 1024), ('MB', 1024**2), ('GB', 1024**3), ('TB', 1024**4)]
    
    def __init__(self, languages):
        self._languages = languages # 语言(dict,{文件后缀名:语言})
        self._codelines = {suffix: (0, 0, 0) for suffix in languages} # 统计结果,{后缀名:(行数,大小,文件数)}
        self._successful = self._error = 0 # 记录成功、失败文件个数
    
    # @param directory: 要扫描的目录
    # @param log: 是否打印日志
    def scan(self, directory, log=False):
        if log: print('Scanning', directory)
        pass
    
    def report(self): # 输出结果
        pass

counter = CodeLinesCounter(languages={'py': 'Python', 'c': 'C', 'cpp': 'C++', 'java': 'Java', 'js': 'JavaScript', 'html': 'HTML', 'css': 'CSS', 'txt': 'Plain text'}) # 创建CodeLinesCounter实例
counter.scan('E:/') # 扫描E盘(注意不能用'E:')
counter.report() # 输出结果

完成,下面正式进入主要部分

2. 统计

2.1 文件扫描

首先,我们需要获取根目录下的文件列表。这可以用os.walk实现:
os.walk(rootdir)返回一个游走器(可迭代),包含根目录下每个子目录的文件及目录列表。我们来看一个例子:
有一文件夹Folder如下:

Folder
|   file1
|   Folder1
|       file2
|       file3
|   Folder2
    |   file4
    |   Folder3

运行如下代码:

import os

for root, dirs, files in os.walk('Folder'):
    print(root, dirs, files)

则输出如下:

Folder					['Folder1', 'Folder2']	['file1']
Folder\Folder1			[]						['file2', 'file3']
Folder\Folder2			['Folder3']				['file4']
Folder\Folder2\Folder3	[]						[]

其中第一项是当前的根目录,第二项为目录下的目录列表,第三项则为当前的文件列表。
因此,我们可以编写如下代码:

# -*- encoding: utf-8 -*-
from os.path import join, getsize, abspath
from os import walk

class CodeLinesCounter(object):
    SIZES = [('B', 1), ('KB', 1024), ('MB', 1024**2), ('GB', 1024**3), ('TB', 1024**4)]
    
    def __init__(self, languages):
        self._languages = languages
        self._results = {suffix: (0, 0, 0) for suffix in languages}
        self._successful = self._error = 0
    
    def scan(self, directory, log=False):
        if log: print('Scanning', directory)
        try:
            for root, _, files in walk(abspath(directory)):
                for filename in files:
                    suffix = filename[filename.rfind('.') + 1:]
                    filename = join(root, filename)
                    if suffix in self._results:
                        lines, size, numFiles = self._results[suffix]
                        lines += 1 # 暂不统计,先按一行计算
                        numFiles += 1
                        size += getsize(filename) # getsize返回文件大小(字节)
                        self._results[suffix] = (lines, size, numFiles)
                    if log: print(filename)
        except KeyboardInterrupt:
            print('\nUser stopped operation')
        else:
            if log: print('Scan finished')
    
    def report(self):
        print('Language\tLines\tSize\tFiles')
        for suffix, (lines, size, files) in sorted(self._results.items(), key=lambda x: x[1], reverse=True):
            print(self._languages[suffix], lines, self.__format_size(size), files, sep='\t')

    # 单位转换
    def __format_size(self, bytes):
        for suffix, size in self.SIZES:
            if bytes < size * 1024:
                return '%.2f %s' % (bytes / size, suffix)
        return '%.2f %s' % (bytes / self.SIZES[-1][1], 2, self.SIZES[-1][0])

counter = CodeLinesCounter(languages={'py': 'Python', 'c': 'C', 'cpp': 'C++', 'java': 'Java', 'js': 'JavaScript', 'html': 'HTML', 'css': 'CSS', 'txt': 'Plain text'})
counter.scan('E:/')
counter.report()

运行结果应类似于下面这样(手动整理了一下):

Language        Lines   Size    		Files
C++     		667     671.51 KB       667
Python  		317     981.01 KB       317
HTML    		38      466.52 KB       38
Plain text    34      90.69 KB        34
JavaScript    19      1.43 MB			19
CSS     		9       341.04 KB       9
C       		2       20.45 KB        2
Java    		1       676.00 B        1

好,下面来到行数统计部分(表格输出后面会介绍)。

2.2 行数统计

众所周知,空行不应该算在代码行数中。因此,统计时需忽略空行。先写上如下代码(替换掉刚才的23行):

with open(filename, 'r', encoding='utf-8') as f: # utf-8编码打开文件
    for line in f:
        if line and not line.isspace(): # 去掉空行
            lines += 1

但是,正当我们兴致勃勃地运行时——

Traceback (most recent call last):
  ...
  File "...\lib\codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 355: invalid start byte

程序报错UnicodeDecodeError,分析后发现原因是部分文件使用了GBK编码,而utf-8编码无法正确打开,因此造成错误。
我们再次改进程序,使其尝试两种编码:

try:
    ln = 0
    with open(filename, 'r', encoding='utf-8') as f:
        for line in f:
            if line and not line.isspace():
                ln += 1
except UnicodeDecodeError: # 尝试使用GBK编码打开
    try:
        ln = 0
        with open(filename, 'r', encoding='gbk') as f:
            for line in f:
                if line and not line.isspace():
                    ln += 1
    except:
        print(filename, '[Error: unknown encoding]')
        self._error += 1
    else:
        lines += ln
except Exception as e:
    print(filename, '[Unknown error: %s]' % e)
    self._error += 1
    continue
lines += ln
if log: print(f'{filename} [{ln}]')
self._successful += 1

这次,我们得到了正确的结果:

Language        Lines   Size    		Files
C++     		35595   671.51 KB       667
JavaScript    24485   1.43 MB 		19
Python  		24130   982.16 KB       317
CSS     		8203    341.04 KB       9
HTML    		6138    466.52 KB       38
Plain text    741     90.69 KB        34
C       		557     20.45 KB        2
Java    		29      676.00 B        1

现在仅剩最后一步了——制表。

3. 制表

python输出表格可以使用PrettyTable库。具体用法如下:

# -*- encoding: utf-8 -*-
from os.path import join, getsize, abspath
from os import walk
from prettytable import PrettyTable

class CodeLinesCounter(object):
    SIZES = [('B', 1), ('KB', 1024), ('MB', 1024**2), ('GB', 1024**3), ('TB', 1024**4)]
    
    def __init__(self, languages):
        self._languages = languages
        self._results = {suffix: (0, 0, 0) for suffix in languages}
        self._successful = self._error = 0
    
    def scan(self, directory, log=False):
        if log: print('Scanning', directory)
        try:
            for root, _, files in walk(abspath(directory)):
                for filename in files:
                    suffix = filename[filename.rfind('.') + 1:]
                    filename = join(root, filename)
                    if suffix in self._results:
                        lines, size, numFiles = self._results[suffix]
                        numFiles += 1
                        size += getsize(filename)
                        try:
                            ln = 0
                            with open(filename, 'r', encoding='utf-8') as f:
                                for line in f:
                                    if line and not line.isspace():
                                        ln += 1
                        except UnicodeDecodeError: # Try 'gbk' encoding
                            try:
                                ln = 0
                                with open(filename, 'r', encoding='gbk') as f:
                                    for line in f:
                                        if line and not line.isspace():
                                            ln += 1
                            except:
                                print(filename, '[Error: unknown encoding]')
                                self._error += 1
                            else:
                                lines += ln
                        except Exception as e:
                            print(filename, '[Unknown error: %s]' % e)
                            self._error += 1
                            continue
                        lines += ln
                        if log: print(f'{filename} [{ln}]')
                        self._successful += 1
                        self._results[suffix] = (lines, size, numFiles)
                    elif log:
                        print(filename, '[None]')
        except KeyboardInterrupt:
            print('\nUser stopped operation')
        else:
            if log: print('Scan finished')
    
    def report(self):
        table = PrettyTable(['Language', 'Lines', 'Size', 'Files'], title=f'Scan result (OK {self._successful}, Error {self._error})') # 创建PrettyTable实例,添加标题
        for suffix, (lines, size, files) in sorted(self._results.items(), key=lambda x: x[1], reverse=True):
            table.add_row([self._languages[suffix], lines, self.__format_size(size), files]) # 添加行
        print(table) # 输出
    
    def __format_size(self, bytes):
        for suffix, size in self.SIZES:
            if bytes < size * 1024:
                return '%.2f %s' % (bytes / size, suffix)
        return '%.2f %s' % (bytes / self.SIZES[-1][1], 2, self.SIZES[-1][0])

counter = CodeLinesCounter(languages={'py': 'Python', 'c': 'C', 'cpp': 'C++', 'java': 'Java', 'js': 'JavaScript', 'html': 'HTML', 'css': 'CSS', 'txt': 'Plain text'})
counter.scan('E:/')
counter.report()

运行结果:

+----------------------------------------+
|     Scan result (OK 1087, Error 0)     |
+------------+-------+-----------+-------+
|  Language  | Lines |    Size   | Files |
+------------+-------+-----------+-------+
|    C++     | 35595 | 671.51 KB |  667  |
| JavaScript | 24485 |  1.43 MB  |   19  |
|   Python   | 24130 | 982.16 KB |  317  |
|    CSS     |  8203 | 341.04 KB |   9   |
|    HTML    |  6138 | 466.52 KB |   38  |
| Plain text |  741  |  90.69 KB |   34  |
|     C      |  557  |  20.45 KB |   2   |
|    Java    |   29  |  676.00 B |   1   |
+------------+-------+-----------+-------+

4. 总结

最终代码(无注释):

# -*- encoding: utf-8 -*-
from os.path import join, getsize, abspath
from os import walk
from prettytable import PrettyTable

class CodeLinesCounter(object):
    SIZES = [('B', 1), ('KB', 1024), ('MB', 1024**2), ('GB', 1024**3), ('TB', 1024**4)]
    
    def __init__(self, languages):
        self._languages = languages
        self._results = {suffix: (0, 0, 0) for suffix in languages}
        self._successful = self._error = 0
    
    def scan(self, directory, log=False):
        if log: print('Scanning', directory)
        try:
            for root, _, files in walk(abspath(directory)):
                for filename in files:
                    suffix = filename[filename.rfind('.') + 1:]
                    filename = join(root, filename)
                    if suffix in self._results:
                        lines, size, numFiles = self._results[suffix]
                        numFiles += 1
                        size += getsize(filename)
                        try:
                            ln = 0
                            with open(filename, 'r', encoding='utf-8') as f:
                                for line in f:
                                    if line and not line.isspace():
                                        ln += 1
                        except UnicodeDecodeError: # Try 'gbk' encoding
                            try:
                                ln = 0
                                with open(filename, 'r', encoding='gbk') as f:
                                    for line in f:
                                        if line and not line.isspace():
                                            ln += 1
                            except:
                                print(filename, '[Error: unknown encoding]')
                                self._error += 1
                            else:
                                lines += ln
                        except Exception as e:
                            print(filename, '[Unknown error: %s]' % e)
                            self._error += 1
                            continue
                        lines += ln
                        if log: print(f'{filename} [{ln}]')
                        self._successful += 1
                        self._results[suffix] = (lines, size, numFiles)
                    elif log:
                        print(filename, '[None]')
        except KeyboardInterrupt:
            print('\nUser stopped operation')
        else:
            if log: print('Scan finished')
    
    def report(self):
        table = PrettyTable(['Language', 'Lines', 'Size', 'Files'], title=f'Scan result (OK {self._successful}, Error {self._error})')
        for suffix, (lines, size, files) in sorted(self._results.items(), key=lambda x: x[1], reverse=True):
            table.add_row([self._languages[suffix], lines, self.__format_size(size), files])
        print(table)
    
    def __format_size(self, bytes):
        for suffix, size in self.SIZES:
            if bytes < size * 1024:
                return '%.2f %s' % (bytes / size, suffix)
        return '%.2f %s' % (bytes / self.SIZES[-1][1], 2, self.SIZES[-1][0])

counter = CodeLinesCounter(languages={'py': 'Python', 'c': 'C', 'cpp': 'C++', 'java': 'Java', 'js': 'JavaScript', 'html': 'HTML', 'css': 'CSS', 'txt': 'Plain text'})
counter.scan('E:/')
counter.report()

后期改进:

  • 增加正则表达式忽略文件
  • matplotlib绘图
  • PyQt5GUI
  • ……(欢迎提出宝贵的意见!)

标签:suffix,Python,self,filename,实用,72,._,print,size
From: https://www.cnblogs.com/stanleys/p/18403486/py_codelinescounter

相关文章

  • python装饰器\迭代器\生成器
    1.迭代器\生成器#斐波那契数列deffib(n):"""生成斐波那契数列的前n个数字。参数:n--要生成的斐波那契数列的长度返回:生成器,产出斐波那契数列的前n个数字。"""a,b=0,1#初始化斐波那契数列的前两个数字content=......
  • Python函数之lambda函数
    温馨提示:如果读者没有学过def定义函数,请先看这里定义形式<函数名>=lambda<参数列表>:<返回值>等同于:def<函数名>(<参数列表>): return<返回值>也可以定义为匿名函数(没有名字的函数):lambda<参数列表>:<返回值>可以确认lambda函数对象的类型与def定义的一样,都是fu......
  • 【开源推荐】MYScrcpy,不仅仅是python实现的Android投屏工具,更是开发测试新选择
    MYScrcpyV1.5.7python语言实现的一个Scrcpy客户端。包含完整的视频、音频、控制解析及展现,开发友好,引入即用!采用DearPyGui作为主要GUI。支持窗口位置记忆、右键手势控制、断线重连、虚拟摄像头投屏、中文输入,锁屏密码解锁等功能。高速模式使用pygame作为鼠标及键......
  • 【python爬虫】从腾讯API爬取美国疫情数据+制表
    最近(文章撰写时间为2020/6/118:40)疫情在中国情况好转,却在美国暴虐。本篇文章将爬取腾讯提供的美国疫情数据并制表。1.爬取数据调用API接口接口:https://api.inews.qq.com/newsqa/v1/automation/modules/list?modules=FAutoCountryMerge观察得到的数据:{ ..., "data":{ ......
  • Python和MATLAB(Java)及Arduino和Raspberry Pi(树莓派)点扩展函数导图
    ......
  • 【Python使用】嘿马python高级进阶全体系教程第9篇:HTTP 协议,1. HTTP 协议的介绍【附
    本教程的知识点为:操作系统1.常见的操作系统4.小结ls命令选项2.小结mkdir和rm命令选项1.mkdir命令选项压缩和解压缩命令1.压缩格式的介绍2.tar命令及选项的使用3.zip和unzip命令及选项的使用4.小结编辑器vim1.vim的介绍2.vim的工作模式3.vim的末行模......
  • Python毕业设计基于Django的川剧戏剧京剧戏曲科普平台 含选座功能
    文末获取资源,收藏关注不迷路文章目录一、项目介绍1管理员功能模块前台系统功能模块二、主要使用技术三、研究内容四、核心代码五、文章目录一、项目介绍随着我国经济的高速发展与人们生活水平的日益提高,人们对生活质量的追求也多种多样。尤其在人们生活节奏不断加......
  • 【Python】对象(包括类、函数)取名方法
    先上干货,通用的:字母:A-Za-z下划线:_数字:0-9(注意:数字不能在开头)理论上可以使用中文变量名,但强烈不建议使用。合法名字举例abcdef GoodCoder AD_fhrygfuigfrA_a_007 __NAME123 _P_T__123456 Cc_Dd _不合法名字举例666Code C++ 1+1=2 (5)4654ty54F 0.123 [email protected]......
  • Python函数之def定义函数
    链接想研究Python函数?看这里函数怎样取名?看这里有参数的函数还可以怎么传参?看这里一、无参数函数结构def<函数名>():#强制要求 <函数体>#可省略 return<返回值>#可省略程序举例用函数的Helloworld程序:#prints'HelloWorld\nDone'#Author:GoodCoder666d......
  • Python函数之*[参数名]和**[参数名]的用处
    一、*[参数名]调用合法调用普通调用*参数名一般写成*args,如:deffunc(*args): print(args)可以试着调用func:>>>func(1)(1,)>>>func()()>>>func(1,2,3)(1,2,3)>>>func(dict(),set(),str(),int())({},set(),'',0)所以,我们发现,这......