[1008] PyPDF2, Merge PDF files, Insert PDF files

时间：2024-06-13 13:55:06浏览次数：26

标签：files Insert merger file path PDF pdf pages

Ref: The PdfMerger Class: merges multiple PDFs into a single PDF.

merge(): Merge the pages from the given file into the output file at the specified page number.
append(): Identical to the merge() method, but assumes you want to concatenate all pages onto the end of the file instead of specifying a position.
write(): Write all data that has been merged to the given output file.

Ref: The PdfReader Class

Ref: The PdfWriter Class

1. Merge PDF files

Use the PdfMerger class from PyPDF2 to merge the PDFs.
Here’s an example:

Python

import PyPDF2

# List of PDFs to merge
pdfs = ['file1.pdf', 'file2.pdf', 'file3.pdf']

merger = PyPDF2.PdfMerger()
for pdf in pdfs:
    merger.append(pdf)

# Write the merged PDF to a new file
merger.write("merged.pdf")
merger.close()

2. Insert PDF files

You can use the PdfMerger class to merge PDFs and insert them at specific positions.
Here’s an example:

Python

# Open the PDF files you want to merge
input1 = open("file1.pdf", "rb")
input2 = open("file2.pdf", "rb")

# Create a PdfFileMerger object
merger = PyPDF2.PdfMerger()

# Append pages from input1 (e.g., pages 0 to 3)
merger.append(fileobj=input1, pages=(0, 3))

# Insert pages from input2 after the second page
merger.merge(position=2, fileobj=input2, pages=(0, 1))

# Write the merged PDF to an output file
with open("output.pdf", "wb") as output:
    merger.write(output)

# Close file descriptors
input1.close()
input2.close()

3. Get the number of pages

pdfReader = PyPDF2.PdfReader("file4.pdf")
print(len(pdfReader.pages))

The big exmaple:

import os, PyPDF2

# find the root directory
# root_dir = __file__[:__file__.find("\\Working\\GIS\\Data\\Models")]

root_dir = r"S:\TRAINING\Bingnan\02_Test_Data\Data_DDR\LI-4155 DDR Merrylands  NSW"
# file name for exporting files
file_name = os.path.basename(root_dir).split() 
# save the type of esr or ddr
esr_ddr = file_name.pop(1) 
file_name_new = " ".join(file_name)  

# Get files of Maps
PDFs_dir = os.path.join(root_dir, "Delivery\\PDFs")

map_path_list = []
hist_path_list = []
report_list = []

for file in os.listdir(PDFs_dir):
    if "Map B" in file: 
        hist_path_list.append(file)
    elif file.find("Map") == 0:
        map_path_list.append(file) 
    elif "LI-" in file and ".pdf" in file:
        report_list.append(file) 
        
# Combine the PDF file of maps
# Create a PDF merger object
pdf_merger = PyPDF2.PdfMerger()
for file in map_path_list:
    pdf_merger.append(os.path.join(PDFs_dir, file)) 
    
# Write the merged PDF to the output file
with open(os.path.join(root_dir, f"Delivery\\Final\\{file_name_new} - Report Maps.pdf"), "wb") as output:
    pdf_merger.write(output)
    
# Sort the historic map files
hist_path_list_new = []
for i in range(len(hist_path_list)):
    for file in hist_path_list:
        if f" B{i+1} " in file:
            hist_path_list_new.append(file) 
            
# Combine the PDF file of historic maps
# Create a PDF merger object
pdf_merger2 = PyPDF2.PdfMerger()
for file in hist_path_list_new:
    pdf_merger2.append(os.path.join(PDFs_dir, file)) 
    
# Write the merged PDF to the output file
with open(os.path.join(root_dir, f"Delivery\\Final\\{file_name_new} - Historic Imagery.pdf"), "wb") as output:
    pdf_merger2.write(output)
    
# Get the number of pages in the PDF files
with open(os.path.join(PDFs_dir, report_list[0]), 'rb') as file:
    pdf_reader = PyPDF2.PdfReader(file)
    report_pages_num = len(pdf_reader.pages)
    
with open(os.path.join(root_dir, f"Delivery\\Final\\{file_name_new} - Report Maps.pdf"), 'rb') as file:
    pdf_reader = PyPDF2.PdfReader(file)
    map_pages_num = len(pdf_reader.pages)
    
merger = PyPDF2.PdfMerger()

# Add the whole report, if for specific pages, can add parameter like "pages=..."
merger.append(os.path.join(PDFs_dir, report_list[0]))

# Insert pages from "map_combine.pdf" after the Appendix A (the third page from the end)
merger.merge(position=report_pages_num - 3, fileobj=open(os.path.join(root_dir, f"Delivery\\Final\\{file_name_new} - Report Maps.pdf"), 'rb'))  

# Insert pages from "historic_map_combine.pdf" after the Appendix B (the last page)
merger.merge(position=report_pages_num + map_pages_num - 1, fileobj=open(os.path.join(root_dir, f"Delivery\\Final\\{file_name_new} - Historic Imagery.pdf"), 'rb'))  

merger.write(open(os.path.join(root_dir, f"Delivery\\Final\\{file_name_new} - {esr_ddr} new.pdf"), 'wb'))  # Save the output PDF

标签：files,Insert,merger,file,path,PDF,pdf,pages
From： https://www.cnblogs.com/alex-bn-lee/p/18245725

[1007] Getting Started with PDF Extract API (Python)
ref:GettingStartedwithPDFExtractAPI(Python)Inthiscase,IplantousethemethodofcompressingPDFtoshrinkthesizeofsomePDFfiles.ButthismethodfromtheAdobeAcrobatAPIdoesn'tworkverywellcomparedthetoolwithintheAdobe......
pdf增强插件：Enfocus PitStop Pro 2022 for Mac 激活版
EnfocusPitStopPro2022是一款功能强大的PDF校对和编辑软件，旨在帮助专业用户对PDF文件进行精确的预检和校对。该软件可以无缝集成到AdobeAcrobat等常用的PDF编辑工具中，提供了一系列全面的预检和编辑功能，以确保PDF文件符合印刷和出版行业的标准和规范。下载......
【文档智能 & RAG】RAG增强之路：增强PDF解析并结构化技术路线方案及思路
前言现阶段，尽管大模型在生成式问答上取得了很大的成功，但由于大部分的数据都是私有数据，大模型的训练及微调成本非常高，RAG的方式逐渐成为落地应用的一种重要的选择方式。然而，如何准确的对文档进行划分chunks，成为一种挑战，在现实中，大部分的专业文档都是以PDF格式存储，低精度的......
记录--前端实现文件预览(word、excel、pdf、ppt、xmind、音视频、图片、文本) 国际化
......
【专题】保险行业数字化洞察白皮书报告PDF合集分享（附原数据表）
报告链接：https://tecdat.cn/?p=33203原文出处：拓端数据部落公众号近年来，"养老"、"三胎政策"、"医疗成本"等一系列备受关注的民生话题，使得保险服务备受瞩目，并逐渐渗透到每个人的生活中。自2020年以来，由于多种因素的影响，人们对健康的意识不断提高，这正在重新塑造中国消费者对保险的......
PDF怎么转成长图？4个好用方法了解一下
PDF文件是一种常见的文档格式，它可以在不同的设备和操作系统上保持格式的一致性。有时候我们需要将PDF文件转换成长图，以便于在社交媒体上分享或者在网站上展示。为了解决这一问题，我们可以尝试通过在线工具或者下载应用来帮助我们实现这一操作。下面将介绍一些常用的工具，可以帮助你......
如何实现pdf转ofd？
Ofd格式是一种开放的文档格式，它具有更高的安全性、更好的跨平台性等优点。并广泛应用于各种文档管理和电子商务应用，比如：合同、报告、手册等。我们日常办公中用到的发票大多是PDF格式，如何将pdf转换成安全性更高的ofd格式呢？今天小编给大家分享几个pdf转换成ofd格式的方法，赶紧来试试......
html2canvas前端生成PDF开箱即用
目录1.下载html2canvas、jspdf2.创建工具类exportPdf文件3.页面中使用需求：将页面展示的所有信息都导出一个pdf文件实现前端生成PDF只要3步 1.下载html2canvas、jspdfnpmihtml2canvas@1.4.1npmijspdf@2.5.12.创建工具类exportPdfjs文件复制即用//导出页......
Zgo - Read Files
packagemainimport("bufio""fmt""io""os")funclineByLine(filestring)error{f,err:=os.Open(file)iferr!=nil{returnerr}deferf.Close()r:=bufio.......
MySQL 中的 INSERT 是怎么加锁的？
在之前的博客中，我写了一系列的文章，比较系统的学习了MySQL的事务、隔离级别、加锁流程以及死锁，我自认为对常见SQL语句的加锁原理已经掌握的足够了，但看到热心网友在评论中提出的一个问题，我还是彻底被问蒙了。他的问题是这样的：加了插入意向锁后，插入数据之前，此时执行了select…lo......

[1008] PyPDF2, Merge PDF files, Insert PDF files

1. Merge PDF files

2. Insert PDF files

3. Get the number of pages

相关文章

赞助商

阅读排行