1、安装 python 库
pip3 install flask PyPDF2 python-docx
2、创建一个Flask应用,并编写处理文件上传和转换的代码
vim pdf_to_docx.py
import os
from flask import Flask, render_template, request, send_file
from PyPDF2 import PdfReader
from io import BytesIO
from docx import Document
app = Flask(__name__)
# 上传文件的HTML页面
@app.route('/')
def index():
return render_template('index.html')
# 处理文件上传和转换
@app.route('/convert', methods=['POST'])
def convert():
if 'file' not in request.files:
return "No file part"
file = request.files['file']
if file.filename == '':
return "No selected file"
if file:
pdf = PdfReader(file)
doc = Document()
for page_num in range(len(pdf.pages)):
page = pdf.pages[page_num]
doc.add_paragraph(page.extract_text())
# 保存docx文件到内存中
doc_buffer = BytesIO()
doc.save(doc_buffer)
doc_buffer.seek(0)
download_basename = os.path.splitext(file.filename)[0]
download_name = download_basename + '.docx'
return send_file(doc_buffer, as_attachment=True, download_name=download_name)
if __name__ == '__main__':
app.run(debug=True)
3、在HTML页面中添加文件上传表单和预览/下载功能
mkdir templates && cd templates
vim index.html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>PDF to DOCX Converter</title>
</head>
<body>
<h1>PDF to DOCX Converter</h1>
<form action="/convert" method="post" enctype="multipart/form-data">
<input type="file" name="file" accept=".pdf">
<button type="submit">Convert</button>
</form>
</body>
</html>
整个目录树结构为:
.
├── pdf_to_docx.py
└── templates
└── index.html
4、运行代码
python3 pdf_to_docx.py
打开本地浏览器访问http://127.0.0.1:5000
出现如下页面
选择本地文件
然后点击Convert
,即下载转换完成的 pdf 文件