我正在尝试将提示和 PDF 文件传递给 Google Gemini API。我遵循了所有文档,但由于某种原因我仍然遇到问题。
这是代码:
def get_vision_response(pdf_file):
try:
genai.configure(api_key=GOOGLE_API_KEY)
model = genai.GenerativeModel(
model_name="gemini-1.5-pro-latest",
system_instruction=[
"You are a helpful transcriber that can accurately transcribe text from images and PDFs.",
"Your mission is to transcribe text from the provided PDF file.",
],
)
pdf_part = Part.from_data(pdf_file, mime_type="application/pdf")
prompt = "Please transcribe the text in this PDF document."
full_content = [prompt, pdf_part]
response = model.generate_content(full_content)
return response.text
except Exception as e:
print("An error occurred during the model invocation")
# Save the error message to a text file
with open("error_log.txt", "w") as error_file:
error_file.write(f"An error occurred: {str(e)}\n")
当我尝试在异常中打印我的错误时,它充满了垃圾字符,所以我不得不将其保存到一份文件。这是我的错误的标头部分:
发生错误:无法创建
Blob
,预期Blob
,dict
或Image
类型(PIL.Image.Image
或IPython.display.Image
)。有一个: <class 'vertexai.generative_models. generative_models.Part'> 值: inline_data { mime_type:“应用程序/pdf”数据: "%PDF-1.7\n%\302\265\302\266\n\n1 0 obj\n<</类型/目录/页数 2 0 R>>\nendobj\n\n2 0 obj\n<</类型/页数/计数 1/儿童[669 0 R]>>\nendobj\n\n3 0 obj\n<</Filter/FlateDecode/Length 9963>>\n流\nx\001\305\235\313\256t\307m\205\347\347)z\234\301Q\357[ \200 \003+\201\221\241a\001\031\307Jd \200\0238\377\373\003\371X\227E\262\252Z\323\3000$\365\256\033\271H\026k\025\367>\177\277\375\351\366\367 \333\235\377=\266\375\366|\357\267\377\375\317\333\277\335\37
有谁知道如何解决这个问题?
此错误消息表明 Gemini API 期望图像作为输入而不是原始 PDF 数据。尽管 Gemini 可以处理 PDF 文件,但需要先将这些文件转换为图像表示形式,然后再将其发送到 API。
可以通过以下两种方法解决此问题:
1. 使用 Python 库在本地转换 PDF 为图像:
可以使用一个库(如
pdf2image
或
PyMuPDF
)将 PDF 页面渲染为图像。然后,可以将这些图像传递给 Gemini API。
以下是如何使用
pdf2image
的示例:
from pdf2image import convert_from_bytes
def get_vision_response(pdf_file):
try:
images = convert_from_bytes(pdf_file)
genai.configure(api_key=GOOGLE_API_KEY)
model = genai.GenerativeModel(
model_name="gemini-1.5-pro-latest",
system_instruction=[
"You are a helpful transcriber that can accurately transcribe text from images and PDFs.",
"Your mission is to transcribe text from the provided PDF file.",
],
)
parts = []
for image in images:
# 将 PIL 图像转换为字节
from io import BytesIO
buffered = BytesIO()
image.save(buffered, format="PNG")
image_bytes = buffered.getvalue()
parts.append(Part.from_data(image_bytes, mime_type="image/png"))
prompt = "Please transcribe the text in this PDF document."
full_content = [prompt, *parts]
response = model.generate_content(full_content)
return response.text
except Exception as e:
print("An error occurred during the model invocation")
# Save the error message to a text file
with open("error_log.txt", "w") as error_file:
error_file.write(f"An error occurred: {str(e)}\n")
2. 使用 Google Cloud AI Document AI API:
更强大的方法是使用 Google Cloud AI Document AI API。此 API 可以分析的 PDF,提取文本、表格和更多结构化数据。然后,可以将此结构化数据传递给 Gemini API 以获得更准确的转录。
以下是如何使用 Document AI API 的示例:
import os
from google.cloud import documentai_v1 as documentai
def get_vision_response(pdf_file):
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "YOUR_SERVICE_ACCOUNT_KEY.json"
client = documentai.DocumentProcessorServiceClient()
# 需要更新此名称以匹配的 Document AI 处理器位置
name = f"projects/{project_id}/locations/{location}/processors/{processor_id}"
image = {"content": pdf_file, "mime_type": "application/pdf"}
request = {"name": name, "raw_document": image}
result = client.process_request(request=request)
document = result.document
text = document.text
genai.configure(api_key=GOOGLE_API_KEY)
model = genai.GenerativeModel(
model_name="gemini-1.5-pro-latest",
system_instruction=[
"You are a helpful transcriber that can accurately transcribe text.",
"Your mission is to transcribe the following text."
],
)
prompt = f"Please transcribe the following text: {text}"
response = model.generate_content(prompt)
return response.text
请记住,需要安装必要的库 (
pdf2image
、
documentai
) 并为 Document AI API 设置的 Google Cloud 项目。
通过使用这些方法,可以有效地将 PDF 数据传递给 Gemini API 并利用其功能进行转录或其他基于语言的任务。
标签:python,google-gemini From: 78685076