python-docx：在保持秩序的同时循环访问段落、表格和图像

时间：2023-09-06 15:24:28浏览次数：61

标签：段落 docx parent python image paragraph child isinstance block

def iter_block_items(parent):
"""
Generate a reference to each paragraph and table child within *parent*,
in document order. Each returned value is an instance of either Table or
Paragraph. *parent* would most commonly be a reference to a main
Document object, but also works for a _Cell object, which itself can
contain paragraphs and tables.
"""
if isinstance(parent, _Document):
    parent_elm = parent.element.body
    # print(parent_elm.xml)
elif isinstance(parent, _Cell):
    parent_elm = parent._tc
else:
    raise ValueError("something's not right")

for child in parent_elm.iterchildren():
    if isinstance(child, CT_P):
        yield Paragraph(child, parent)
    elif isinstance(child, CT_Tbl):
        yield Table(child, parent)

我可以获取文档图像的有序列表：

pictures = []
for pic in dwo.inline_shapes:
    if pic.type == WD_INLINE_SHAPE.PICTURE:
        pictures.append(pic)

我可以在段落末尾插入特定图像：

def insert_picture(index, paragraph):
    inline = pictures[index]._inline
    rId = inline.xpath('./a:graphic/a:graphicData/pic:pic/pic:blipFill/a:blip/@r:embed')[0]
    image_part = dwo.part.related_parts[rId]
    image_bytes = image_part.blob
    image_stream = BytesIO(image_bytes)
    paragraph.add_run().add_picture(image_stream, Inches(6.5))
    return

我像这样使用函数 iter_block_items（）：

start_copy = False
for block in iter_block_items(document):
    if isinstance(block, Paragraph):
        if block.text == "TEXT FROM WHERE WE STOP COPYING":
            break

    if start_copy:
        if isinstance(block, Paragraph):
            last_paragraph = insert_paragraph_after(last_paragraph,block.text)

        elif isinstance(block, Table):
            paragraphs_with_table.append(last_paragraph)
            tables_to_apppend.append(block._tbl)

    if isinstance(block, Paragraph):
        if block.text == ""TEXT FROM WHERE WE START COPYING":
            start_copy = True

标签：段落,docx,parent,python,image,paragraph,child,isinstance,block
From： https://www.cnblogs.com/QQ-77Ly/p/17682352.html

python+pytest+yam接口自动化
分层设计项目下创建api、case、data、common（utils）目录：api下存放封装好的接口，case下放编写的测试用例，data下放测试数据，common下放公共操作（像连接数据库，读取yaml文件等）api下封装的登录接口： case下对登录写的测试用例：用例设计的原则（pytest怎么去找的用例）：文件名以test_*.py......
python3中所有保留字（关键字）
Python3中的保留字(关键字)|AmosCloudWiki ......
Python之断点续传下载及进度显示
Python之断点续传下载及进度显示某日，因工作需要下载大量OSGB数据，下载链接来源于一个csv文件，于是解析了csv文件然后下载。为了提高下载效率及进度显示，写了一份脚本。环境python3.7requestscsv过程解析csv废话不多说，先上代码：deffetch_download_url(source):res......
使用python自动根据数据库的成品重量编写一个ppt并保存在"d:\test.ppt"
要使用Python自动创建一个PPT并根据数据库中的成品重量生成内容，你可以使用Python的`python-pptx`库来实现。首先，你需要确保已经安装了这个库。你可以使用以下命令安装它：```pythonpipinstallpython-pptx```接下来，你可以按照以下步骤创建一个Python脚本来实现你的需求：```py......
Python 设置环境变量方法
Python中的os模块Python中的os模块提供了很多与操作系统相关的功能。其中就包括设置环境变量的方法，即setenv()方法。使用os.setenv()方法设置环境变量importosos.setenv('VAR_NAME','VAR_VALUE')其中，VAR_NAME是环境变量的名称，VAR_VALUE是环境变量的值。这样我们就可以使用......
向python脚本传递参数
需要模块：sys参数个数：len(sys.argv)脚本名： sys.argv[0]参数1： sys.argv[1]参数2： sys.argv[2]importsysprint"脚本名：",sys.argv[0]foriinrange(1,len(sys.argv)):print......
Python终端如何输出彩色字体
实现过程：终端的字符颜色是用转义序列控制的，是文本模式下的系统显示功能，和具体的语言无关。转义序列是以ESC开头,即用\033来完成（ESC的ASCII码用十进制表示是27，用八进制表示就是033）。书写格式：开头部分\033[显示方式;前景色;背景色m+结尾部分：\033[0m注意：开头部分的三个参数：显......
Go如何自动解压缩包？如何读取docx/doc文件内容？
在开发过程中，我们常常需要处理压缩包和文档文件。本文将介绍如何使用Go语言自动解压缩包和读取docx/doc文件。一、解压缩包压缩包格式常见的压缩包格式有zip、gzip、bzip2等。在Go语言中，使用archive/zip、compress/gzip、compress/bzip2包可以轻松地处理这些格式......
Node.js 使用 officecrypto-tool 读取加密的 Excel 和 Word 文档, 支持 xlsx 和 docx
Node.js使用officecrypto-tool读取加密的Excel(xls,xlsx)和Word(docx)文档,还支持xlsx和docx文件的加密（具体使用看文档）。暂时不支持doc文件的解密传送门：officecrypto-tool读取加密的Excel示例一：xlsx-populate//只支持xlsx，xlsx-populate自带了解密功能......
python3.10及以上版本编译安装ssl模块(openssl)
由于python3.10之后版本不在支持libressl使用ssl，需要使用openssl安装来解决编译安装python时候遇到的ssl模块导入失败的问题，这里需要用的openssl1.1.1版本或者更高版本在别人的博客查阅到资料，特此记录：https://blog.csdn.net/ye__mo/article/details/129436629?spm=1001.2101.30......

python-docx：在保持秩序的同时循环访问段落、表格和图像

相关文章

赞助商

阅读排行