首页 > 其他分享 >[1008] PyPDF2, Merge PDF files, Insert PDF files

[1008] PyPDF2, Merge PDF files, Insert PDF files

时间:2024-06-13 13:55:06浏览次数:26  
标签:files Insert merger file path PDF pdf pages

Ref: The PdfMerger Class: merges multiple PDFs into a single PDF.

  • merge(): Merge the pages from the given file into the output file at the specified page number.

  • append(): Identical to the merge() method, but assumes you want to concatenate all pages onto the end of the file instead of specifying a position.

  • write(): Write all data that has been merged to the given output file.

Ref: The PdfReader Class

Ref: The PdfWriter Class


1. Merge PDF files

  • Use the PdfMerger class from PyPDF2 to merge the PDFs.
  • Here’s an example:
Python
import PyPDF2

# List of PDFs to merge
pdfs = ['file1.pdf', 'file2.pdf', 'file3.pdf']

merger = PyPDF2.PdfMerger()
for pdf in pdfs:
    merger.append(pdf)

# Write the merged PDF to a new file
merger.write("merged.pdf")
merger.close()

2. Insert PDF files

  • You can use the PdfMerger class to merge PDFs and insert them at specific positions.
  • Here’s an example:
Python
# Open the PDF files you want to merge
input1 = open("file1.pdf", "rb")
input2 = open("file2.pdf", "rb")

# Create a PdfFileMerger object
merger = PyPDF2.PdfMerger()

# Append pages from input1 (e.g., pages 0 to 3)
merger.append(fileobj=input1, pages=(0, 3))

# Insert pages from input2 after the second page
merger.merge(position=2, fileobj=input2, pages=(0, 1))

# Write the merged PDF to an output file
with open("output.pdf", "wb") as output:
    merger.write(output)

# Close file descriptors
input1.close()
input2.close()

3. Get the number of pages

pdfReader = PyPDF2.PdfReader("file4.pdf")
print(len(pdfReader.pages))

The big exmaple:

import os, PyPDF2

# find the root directory
# root_dir = __file__[:__file__.find("\\Working\\GIS\\Data\\Models")]

root_dir = r"S:\TRAINING\Bingnan\02_Test_Data\Data_DDR\LI-4155 DDR Merrylands  NSW"
# file name for exporting files
file_name = os.path.basename(root_dir).split() 
# save the type of esr or ddr
esr_ddr = file_name.pop(1) 
file_name_new = " ".join(file_name)  

# Get files of Maps
PDFs_dir = os.path.join(root_dir, "Delivery\\PDFs")

map_path_list = []
hist_path_list = []
report_list = []

for file in os.listdir(PDFs_dir):
    if "Map B" in file: 
        hist_path_list.append(file)
    elif file.find("Map") == 0:
        map_path_list.append(file) 
    elif "LI-" in file and ".pdf" in file:
        report_list.append(file) 
        
# Combine the PDF file of maps
# Create a PDF merger object
pdf_merger = PyPDF2.PdfMerger()
for file in map_path_list:
    pdf_merger.append(os.path.join(PDFs_dir, file)) 
    
# Write the merged PDF to the output file
with open(os.path.join(root_dir, f"Delivery\\Final\\{file_name_new} - Report Maps.pdf"), "wb") as output:
    pdf_merger.write(output)
    
# Sort the historic map files
hist_path_list_new = []
for i in range(len(hist_path_list)):
    for file in hist_path_list:
        if f" B{i+1} " in file:
            hist_path_list_new.append(file) 
            
# Combine the PDF file of historic maps
# Create a PDF merger object
pdf_merger2 = PyPDF2.PdfMerger()
for file in hist_path_list_new:
    pdf_merger2.append(os.path.join(PDFs_dir, file)) 
    
# Write the merged PDF to the output file
with open(os.path.join(root_dir, f"Delivery\\Final\\{file_name_new} - Historic Imagery.pdf"), "wb") as output:
    pdf_merger2.write(output)
    
# Get the number of pages in the PDF files
with open(os.path.join(PDFs_dir, report_list[0]), 'rb') as file:
    pdf_reader = PyPDF2.PdfReader(file)
    report_pages_num = len(pdf_reader.pages)
    
with open(os.path.join(root_dir, f"Delivery\\Final\\{file_name_new} - Report Maps.pdf"), 'rb') as file:
    pdf_reader = PyPDF2.PdfReader(file)
    map_pages_num = len(pdf_reader.pages)
    
merger = PyPDF2.PdfMerger()

# Add the whole report, if for specific pages, can add parameter like "pages=..."
merger.append(os.path.join(PDFs_dir, report_list[0]))

# Insert pages from "map_combine.pdf" after the Appendix A (the third page from the end)
merger.merge(position=report_pages_num - 3, fileobj=open(os.path.join(root_dir, f"Delivery\\Final\\{file_name_new} - Report Maps.pdf"), 'rb'))  

# Insert pages from "historic_map_combine.pdf" after the Appendix B (the last page)
merger.merge(position=report_pages_num + map_pages_num - 1, fileobj=open(os.path.join(root_dir, f"Delivery\\Final\\{file_name_new} - Historic Imagery.pdf"), 'rb'))  

merger.write(open(os.path.join(root_dir, f"Delivery\\Final\\{file_name_new} - {esr_ddr} new.pdf"), 'wb'))  # Save the output PDF

 

标签:files,Insert,merger,file,path,PDF,pdf,pages
From: https://www.cnblogs.com/alex-bn-lee/p/18245725

相关文章

  • [1007] Getting Started with PDF Extract API (Python)
    ref:GettingStartedwithPDFExtractAPI(Python)Inthiscase,IplantousethemethodofcompressingPDFtoshrinkthesizeofsomePDFfiles.ButthismethodfromtheAdobeAcrobatAPIdoesn'tworkverywellcomparedthetoolwithintheAdobe......
  • pdf增强插件:Enfocus PitStop Pro 2022 for Mac 激活版
    EnfocusPitStopPro2022是一款功能强大的PDF校对和编辑软件,旨在帮助专业用户对PDF文件进行精确的预检和校对。该软件可以无缝集成到AdobeAcrobat等常用的PDF编辑工具中,提供了一系列全面的预检和编辑功能,以确保PDF文件符合印刷和出版行业的标准和规范。下载......
  • 【文档智能 & RAG】RAG增强之路:增强PDF解析并结构化技术路线方案及思路
    前言现阶段,尽管大模型在生成式问答上取得了很大的成功,但由于大部分的数据都是私有数据,大模型的训练及微调成本非常高,RAG的方式逐渐成为落地应用的一种重要的选择方式。然而,如何准确的对文档进行划分chunks,成为一种挑战,在现实中,大部分的专业文档都是以PDF格式存储,低精度的......
  • 记录--前端实现文件预览(word、excel、pdf、ppt、xmind、 音视频、图片、文本) 国际化
    ......
  • 【专题】保险行业数字化洞察白皮书报告PDF合集分享(附原数据表)
    报告链接:https://tecdat.cn/?p=33203原文出处:拓端数据部落公众号近年来,"养老"、"三胎政策"、"医疗成本"等一系列备受关注的民生话题,使得保险服务备受瞩目,并逐渐渗透到每个人的生活中。自2020年以来,由于多种因素的影响,人们对健康的意识不断提高,这正在重新塑造中国消费者对保险的......
  • PDF怎么转成长图?4个好用方法了解一下
    PDF文件是一种常见的文档格式,它可以在不同的设备和操作系统上保持格式的一致性。有时候我们需要将PDF文件转换成长图,以便于在社交媒体上分享或者在网站上展示。为了解决这一问题,我们可以尝试通过在线工具或者下载应用来帮助我们实现这一操作。下面将介绍一些常用的工具,可以帮助你......
  • 如何实现pdf转ofd?
    Ofd格式是一种开放的文档格式,它具有更高的安全性、更好的跨平台性等优点。并广泛应用于各种文档管理和电子商务应用,比如:合同、报告、手册等。我们日常办公中用到的发票大多是PDF格式,如何将pdf转换成安全性更高的ofd格式呢?今天小编给大家分享几个pdf转换成ofd格式的方法,赶紧来试试......
  • html2canvas前端生成PDF开箱即用
    目录1.下载html2canvas、jspdf2.创建工具类exportPdf文件3.页面中使用需求:将页面展示的所有信息都导出一个pdf文件 实现前端生成PDF只要3步 1.下载html2canvas、jspdfnpmihtml2canvas@1.4.1npmijspdf@2.5.12.创建工具类exportPdfjs文件复制即用//导出页......
  • Zgo - Read Files
     packagemainimport("bufio""fmt""io""os")funclineByLine(filestring)error{f,err:=os.Open(file)iferr!=nil{returnerr}deferf.Close()r:=bufio.......
  • MySQL 中的 INSERT 是怎么加锁的?
    在之前的博客中,我写了一系列的文章,比较系统的学习了MySQL的事务、隔离级别、加锁流程以及死锁,我自认为对常见SQL语句的加锁原理已经掌握的足够了,但看到热心网友在评论中提出的一个问题,我还是彻底被问蒙了。他的问题是这样的:加了插入意向锁后,插入数据之前,此时执行了select…lo......