将提示和 PDF 传递到 Gemini API 时“无法创建‘Blob’”

标签：python google-gemini

我正在尝试将提示和 PDF 文件传递给 Google Gemini API。我遵循了所有文档，但由于某种原因我仍然遇到问题。

这是代码：

def get_vision_response(pdf_file):
    try:
        genai.configure(api_key=GOOGLE_API_KEY)

        model = genai.GenerativeModel(
            model_name="gemini-1.5-pro-latest",
            system_instruction=[
                "You are a helpful transcriber that can accurately transcribe text from images and PDFs.",
                "Your mission is to transcribe text from the provided PDF file.",
            ],
        )

        pdf_part = Part.from_data(pdf_file, mime_type="application/pdf")

        prompt = "Please transcribe the text in this PDF document."

        full_content = [prompt, pdf_part]
        
        response = model.generate_content(full_content)
        
        return response.text
        
    except Exception as e:
        print("An error occurred during the model invocation")
        
        # Save the error message to a text file
        with open("error_log.txt", "w") as error_file:
            error_file.write(f"An error occurred: {str(e)}\n")

当我尝试在异常中打印我的错误时，它充满了垃圾字符，所以我不得不将其保存到一份文件。这是我的错误的标头部分：

发生错误：无法创建 Blob ，预期 Blob ， dict 或 Image 类型（ PIL.Image.Image 或 IPython.display.Image ）。有一个： <class 'vertexai.generative_models. generative_models.Part'> 值： inline_data { mime_type：“应用程序/pdf”数据： "%PDF-1.7\n%\302\265\302\266\n\n1 0 obj\n<</类型/目录/页数 2 0 R>>\nendobj\n\n2 0 obj\n<</类型/页数/计数 1/儿童[669 0 R]>>\nendobj\n\n3 0 obj\n<</Filter/FlateDecode/Length 9963>>\n流\nx\001\305\235\313\256t\307m\205\347\347)z\234\301Q\357[ \200 \003+\201\221\241a\001\031\307Jd \200\0238\377\373\003\371X\227E\262\252Z\323\3000$\365\256\033\271H\026k\025\367>\177\277\375\351\366\367 \333\235\377=\266\375\366|\357\267\377\375\317\333\277\335\37

有谁知道如何解决这个问题？

此错误消息表明 Gemini API 期望图像作为输入而不是原始 PDF 数据。尽管 Gemini 可以处理 PDF 文件，但需要先将这些文件转换为图像表示形式，然后再将其发送到 API。

可以通过以下两种方法解决此问题：

1. 使用 Python 库在本地转换 PDF 为图像：

可以使用一个库（如 pdf2image 或 PyMuPDF ）将 PDF 页面渲染为图像。然后，可以将这些图像传递给 Gemini API。

以下是如何使用 pdf2image 的示例：

from pdf2image import convert_from_bytes

def get_vision_response(pdf_file):
    try:
        images = convert_from_bytes(pdf_file) 

        genai.configure(api_key=GOOGLE_API_KEY)

        model = genai.GenerativeModel(
            model_name="gemini-1.5-pro-latest",
            system_instruction=[
                "You are a helpful transcriber that can accurately transcribe text from images and PDFs.",
                "Your mission is to transcribe text from the provided PDF file.",
            ],
        )

        parts = []
        for image in images:
            # 将 PIL 图像转换为字节
            from io import BytesIO
            buffered = BytesIO()
            image.save(buffered, format="PNG")
            image_bytes = buffered.getvalue()
            parts.append(Part.from_data(image_bytes, mime_type="image/png"))

        prompt = "Please transcribe the text in this PDF document."
        full_content = [prompt, *parts] 

        response = model.generate_content(full_content)

        return response.text

    except Exception as e:
        print("An error occurred during the model invocation")

        # Save the error message to a text file
        with open("error_log.txt", "w") as error_file:
            error_file.write(f"An error occurred: {str(e)}\n")

2. 使用 Google Cloud AI Document AI API：

更强大的方法是使用 Google Cloud AI Document AI API。此 API 可以分析的 PDF，提取文本、表格和更多结构化数据。然后，可以将此结构化数据传递给 Gemini API 以获得更准确的转录。

以下是如何使用 Document AI API 的示例：

import os
from google.cloud import documentai_v1 as documentai

def get_vision_response(pdf_file):
    os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "YOUR_SERVICE_ACCOUNT_KEY.json" 

    client = documentai.DocumentProcessorServiceClient()

    # 需要更新此名称以匹配的 Document AI 处理器位置
    name = f"projects/{project_id}/locations/{location}/processors/{processor_id}"

    image = {"content": pdf_file, "mime_type": "application/pdf"}
    request = {"name": name, "raw_document": image}

    result = client.process_request(request=request)
    document = result.document

    text = document.text

    genai.configure(api_key=GOOGLE_API_KEY)

    model = genai.GenerativeModel(
        model_name="gemini-1.5-pro-latest",
        system_instruction=[
            "You are a helpful transcriber that can accurately transcribe text.",
            "Your mission is to transcribe the following text."
        ],
    )

    prompt = f"Please transcribe the following text: {text}"

    response = model.generate_content(prompt)

    return response.text

请记住，需要安装必要的库 ( pdf2image 、 documentai ) 并为 Document AI API 设置的 Google Cloud 项目。

通过使用这些方法，可以有效地将 PDF 数据传递给 Gemini API 并利用其功能进行转录或其他基于语言的任务。

标签：python,google-gemini
From： 78685076

将提示和 PDF 传递到 Gemini API 时“无法创建‘Blob’”

相关文章

赞助商

阅读排行