平时使用WechatDownload保存了很多技术文章,格式比较乱。
比如很多空白行、英文乱码、页面左右缩进太多等问题,今天一并解决一下
安装python-docx
from docx import Document from docx.shared import Cm import os rootdir=r'E:\vxWEB\GIS' for files in os.listdir(rootdir): filename=os.path.join(rootdir,files) print(filename) doc=Document(filename) for para in doc.paragraphs: para.paragraph_format.left_indent=Cm(0)#前后缩进 para.paragraph_format.right_indent=Cm(0) # para.paragraph_format.first_line_indent = Cm(1)#首行缩进 para.paragraph_format.line_spacing = 1.0#行间距 if len(para.text)<= 1 and len(para.runs) < 1:#删除空行 p = para._element p.getparent().remove(p) p._p = p._element = None for run in para.runs:#设置英文字体 run.font.name = 'Times New Roman' doc.save(filename) print('ok')
空行的删除需要注意,仅判断没有文字会导致图片被删除,这里通过len(para.runs) < 1判断没有图片
标签:段落,docx,indent,para,Cm,format,python,paragraph From: https://www.cnblogs.com/yifeimiao/p/17500058.html