首页 > 编程语言 >Python处理PDF

Python处理PDF

时间:2022-12-03 22:46:57浏览次数:101  
标签:output Python writer reader watermark 处理 pdf PDF page

目录

reference: How to Work With a PDF in Python
reference: 给PDF添加水印

本文使用的PDF处理库为pypdf2

Read info

def extract_information(pdf_path):
    with open(pdf_path, 'rb') as f:
        pdf = PdfFileReader(f)
        information = pdf.getDocumentInfo()
        number_of_pages = pdf.getNumPages()
		# multiply 0.352 to convert inches to millimeters
        height = float(pdf.getPage(0).mediaBox.getHeight()) * 0.352
        width = float(pdf.getPage(0).mediaBox.getWidth()) * 0.352

    txt = f"""
    Information about {pdf_path}: 

    Author: {information.author}
    Creator: {information.creator}
    Producer: {information.producer}
    Subject: {information.subject}
    Title: {information.title}
    Number of pages: {number_of_pages}
    Height: {height:.2f}
    Width: {width:.2f}
    """

    print(txt)

Rotate Page


def rotate_pages(pdf_path):
    pdf_writer = PdfFileWriter()
    pdf_reader = PdfFileReader(pdf_path)
    # Rotate page 90 degrees to the right
    page_1 = pdf_reader.getPage(0).rotateClockwise(90)
    pdf_writer.addPage(page_1)
    # Rotate page 90 degrees to the left
    page_2 = pdf_reader.getPage(1).rotateCounterClockwise(90)
    pdf_writer.addPage(page_2)
    # Add a page in normal orientation
    pdf_writer.addPage(pdf_reader.getPage(2))

    with open('rotate_pages.pdf', 'wb') as fh:
        pdf_writer.write(fh)

Merge PDFs

def merge_pdfs(paths : list, output : str):
    pdf_writer = PdfFileWriter()

    for path in paths:
        pdf_reader = PdfFileReader(path)
        for page in range(pdf_reader.getNumPages()):
            # Add each page to the writer object
            pdf_writer.addPage(pdf_reader.getPage(page))

    # Write out the merged PDF
    with open(output, 'wb') as out:
        pdf_writer.write(out)

Split PDFs

def split(path, name_of_split):
    pdf = PdfFileReader(path)
    for page in range(pdf.getNumPages()):
        pdf_writer = PdfFileWriter()
        pdf_writer.addPage(pdf.getPage(page))

        output = f'{name_of_split}{page}.pdf'
        with open(output, 'wb') as output_pdf:
            pdf_writer.write(output_pdf)

Encrypt a PDF

def add_encryption(input_pdf, output_pdf, password):
    pdf_writer = PdfFileWriter()
    pdf_reader = PdfFileReader(input_pdf)

    for page in range(pdf_reader.getNumPages()):
        pdf_writer.addPage(pdf_reader.getPage(page))
	# or use pdf_writer.appendPagesFromReader(pdf_reader)
    pdf_writer.encrypt(user_pwd=password, owner_pwd=None, 
                       use_128bit=True)

    with open(output_pdf, 'wb') as fh:
        pdf_writer.write(fh)

Decrypt a PDF

def decrypt_pdf(input_pdf, output_pdf, password):
    pdf_writer = PdfFileWriter()
    pdf_reader = PdfFileReader(input_pdf)

    if pdf_reader.isEncrypted:
        try:
            pdf_reader.decrypt(password)
        except:
            print("Wrong password")
    else:
        print("File is not encrypted")

    for page in range(pdf_reader.getNumPages()):
        pdf_writer.addPage(pdf_reader.getPage(page))

    with open(output_pdf, 'wb') as fh:
        pdf_writer.write(fh)

Add watermark

使用word制作水印页面。
设计->水印->自定义水印,导出PDF作为watermark.pdf

def create_watermark(input_pdf, output, watermark):
    watermark_obj = PdfFileReader(watermark)
    watermark_page = watermark_obj.getPage(0)

    pdf_reader = PdfFileReader(input_pdf)
    pdf_writer = PdfFileWriter()
    # multiply 0.352 to convert inches to millimeters
    print(f"watermask height: {0.352 * float(watermark_page.mediaBox.getHeight()):.2f}, \
            watermask width: {0.352 * float(watermark_page.mediaBox.getWidth()):.2f}")
    # Watermark all the pages
    for page in range(pdf_reader.getNumPages()):
        page = pdf_reader.getPage(page)
        print(f"page height: {0.352 * float(page.mediaBox.getHeight()):.2f}, \
                page width: {0.352 * float(page.mediaBox.getWidth()):.2f}")
        page.mergePage(watermark_page)
        pdf_writer.addPage(page)

    with open(output, 'wb') as out:
        pdf_writer.write(out)

标签:output,Python,writer,reader,watermark,处理,pdf,PDF,page
From: https://www.cnblogs.com/coco02/p/16948917.html

相关文章

  • 详解支持向量机-SVC真实数据案例:预测明天是否会下雨-处理困难特征:地点【菜菜的sklearn
    视频作者:菜菜TsaiTsai链接:【技术干货】菜菜的机器学习sklearn【全85集】Python进阶_哔哩哔哩_bilibili常识上来说,我们认为地点肯定是对明天是否会下雨存在影响的。比如......
  • 最大流,最小费最大流问题 python
    最大流,最小费最大流问题python徐少华算法设计与分析P145解题思路解题算法最小费用最大流:解法I步骤一:利用最大流算法,将网络的流量调整到最大流步骤二:构建......
  • Python 第11章 上机实验
    说明:导入pymysql包,关于使用mysql的代码,只能在我的电脑使用,同时我抹去了使用mysql的账号秘密importsqlite3#连接到SQLite数据库conn=sqlite3.connect('mrsoft.db')......
  • 【Python】笔记:协程
    协程用作协程的生成器的基本行为协程使用生成器函数定义:定义体中有yield关键字defsimple_coroutine():print('->coroutinestart')x=yield#因为......
  • 【Python】笔记:上下文管理器和else快
    上下文管理器和else快类似于then的elsefor...else...仅在for循环运行完毕后运行else,不能被breakwhile...else...仅在while条件为false而退出后运行......
  • 【Python】笔记:可迭代的对象、迭代器和生成器
    可迭代的对象、迭代器和生成器importreimportreprlibRE_WORD=re.compile('\w+')classSentence_v1:def__init__(self,text):self.text=text......
  • PCB表面处理工艺(转自中信华PCB)
    1.裸铜板优缺点很明显:优点:成本低、表面平整,焊接性良好(在没有被氧化的情况下)。缺点:容易受到酸及湿度影响,不能久放,拆封后需在2小时内用完,因为铜暴露在空气中......
  • 【Python】笔记:接口:从协议到抽象基类
    S11接口:从协议到抽象基类#random.shuffle就地打乱fromrandomimportshufflel=list(range(10))shuffle(l)print(l)shuffle(l)print(l)[0,6,3,2,4,8,......
  • 【Python】笔记:正确重载运算符
    正确重载运算符一元运算符-(__neg__)+(__pos__)最好返回self的副本~(__invert__)对整数位按位取反(~x==-(x+1))print(~2)-3中辍运算符+fromarray......
  • 回文链表-python
    问题:给你一个单链表的头节点 head ,请你判断该链表是否为回文链表。如果是,返回 true ;否则,返回 false 。思考:对称结构要想到stack方案一:双指针法将节点值赋值到数组......