名称 |
含义 |
用时(s) |
写文件总用时(s) |
单核写入全文 |
146 |
146 |
|
多核并行写入多(6)部分 |
101 |
||
cat合并 |
linux cat命令 |
173 |
274 |
python shutil |
用python shutil包的copyfileobj函数拷贝文件 |
123 |
224 |
python 单线程读写(plain) |
用python open函数读取再写入的方式拷贝文件(模式为'wb') |
122 |
223 |
cat的时间可能不标准,实际上cat应该略快于python合并,因为测试在HPC上,NAS盘延迟比较高,所以调用cat这类系统命令时会有一定延迟。
总结
多进程写入速度不如单线程写入。
代码
import os import shutil import time from multiprocessing import Process def gen_test_files(fn): open(fn, 'w').write('q'*3*10**8) def gen_files(): t0 = time.time() ps = [] for i in range(num): ps.append(Process(target=gen_test_files, args=(fns[i], ), daemon=True)) for i in ps: i.start() for i in ps: i.join() tu1 = time.time() - t0 print("多进程写文件用时:", tu1) # 生成完整文件 ftotal = 'total.txt' t1 = time.time() open(ftotal, 'w').write('q'*3*10**8*num) tu2 = time.time() - t1 print('生成全文,用时:', tu2) return tu1, tu2 def test_cat(fns): fn = 'test_cat.txt' cmd = f'cat {" ".join(fns)} > {fn}' t1 = time.time() os.system(cmd) tu = time.time() - t1 print('cat用时:', tu) return tu def test_shutil(fns): fn = 'test_shutil.txt' t1 = time.time() f = open(fn, 'wb') for i in fns: shutil.copyfileobj(open(i, 'rb'), f) f.close() tu = time.time() - t1 print('shutil 用时: ', tu) return tu def test_plain(fns): fn = 'test_plain.txt' t1 = time.time() f = open(fn, 'wb') for i in fns: f.write(open(i, 'rb').read()) f.close() tu = time.time() - t1 print('plain 用时:', tu) return tu def test(): tms = {'multi': (gen_files())[0], 'total': (gen_files())[1], 'cat': test_cat(fns), 'shutil': test_shutil(fns), 'plain': test_plain(fns)} return tms def test20(): ts = {} for i in range(20): tms = test() for i in tms: ts[i] = ts.get(i, 0.0) + tms[i] print('总结:') for i in ts: print(i+'\t'+str(ts[i])) if __name__ == '__main__': num = 6 fns = [(str(index)+'.txt') for index in range(num)] # print('测试开始') # print('测试结束') test20()View Code
测试: python test_cat_files.py
(准备至少9G磁盘空间)
标签:文件,fns,tu,合并,cat,print,测试,time,test From: https://www.cnblogs.com/roundfish/p/17054840.html