使用LlamaIndex和GPT-4V进行多模态图像检索

时间：2024-08-08 09:26:22浏览次数：11

标签：wiki img 4V image LlamaIndex images import GPT path

在本文中，我们将演示如何使用LlamaIndex结合GPT-4V和CLIP来实现图像到图像的检索。该过程包括从维基百科下载图像和文本，构建多模态索引，利用GPT-4V进行图像相关性推理，并展示检索结果。

步骤

1. 安装所需的库

%pip install llama-index-multi-modal-llms-openai
%pip install llama-index-vector-stores-qdrant
%pip install llama_index ftfy regex tqdm
%pip install git+https://github.com/openai/CLIP.git
%pip install torch torchvision
%pip install matplotlib scikit-image
%pip install -U qdrant_client

2. 下载维基百科的图像和文本

import os
import wikipedia
import urllib.request
from pathlib import Path

image_path = Path("mixed_wiki")
image_uuid = 0
image_metadata_dict = {}
MAX_IMAGES_PER_WIKI = 30

wiki_titles = [
    "Vincent van Gogh",
    "San Francisco",
    "Batman",
    "iPhone",
    "Tesla Model S",
    "BTS band",
]

if not image_path.exists():
    Path.mkdir(image_path)

for title in wiki_titles:
    images_per_wiki = 0
    print(title)
    try:
        page_py = wikipedia.page(title)
        list_img_urls = page_py.images
        for url in list_img_urls:
            if url.endswith(".jpg") or url.endswith(".png"):
                image_uuid += 1
                image_file_name = title + "_" + url.split("/")[-1]
                image_metadata_dict[image_uuid] = {
                    "filename": image_file_name,
                    "img_path": "./" + str(image_path / f"{image_uuid}.jpg"),
                }
                urllib.request.urlretrieve(
                    url, image_path / f"{image_uuid}.jpg"
                )
                images_per_wiki += 1
                if images_per_wiki > MAX_IMAGES_PER_WIKI:
                    break
    except Exception as e:
        print(f"Exception: No images found for Wikipedia page: {title}")
        continue

3. 展示下载的图像

from PIL import Image
import matplotlib.pyplot as plt
import os

image_paths = []
for img_path in os.listdir("./mixed_wiki"):
    image_paths.append(str(os.path.join("./mixed_wiki", img_path)))

def plot_images(image_paths):
    images_shown = 0
    plt.figure(figsize=(16, 9))
    for img_path in image_paths:
        if os.path.isfile(img_path):
            image = Image.open(img_path)
            plt.subplot(3, 3, images_shown + 1)
            plt.imshow(image)
            plt.xticks([])
            plt.yticks([])
            images_shown += 1
            if images_shown >= 9:
                break

plot_images(image_paths)

4. 构建多模态索引和向量存储

from llama_index.core.indices import MultiModalVectorStoreIndex
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import SimpleDirectoryReader, StorageContext
import qdrant_client

client = qdrant_client.QdrantClient(path="qdrant_img_db")
text_store = QdrantVectorStore(client=client, collection_name="text_collection")
image_store = QdrantVectorStore(client=client, collection_name="image_collection")
storage_context = StorageContext.from_defaults(vector_store=text_store, image_store=image_store)
documents = SimpleDirectoryReader("./mixed_wiki/").load_data()
index = MultiModalVectorStoreIndex.from_documents(documents, storage_context=storage_context)

5. 检索相关图像并进行推理

retriever_engine = index.as_retriever(image_similarity_top_k=4)
retrieval_results = retriever_engine.image_to_image_retrieve("./mixed_wiki/2.jpg")
retrieved_images = [res.node.metadata["file_path"] for res in retrieval_results]

plot_images(retrieved_images[1:])

from llama_index.multi_modal_llms.openai import OpenAIMultiModal
from llama_index.core.schema import ImageDocument

image_documents = [ImageDocument(image_path="./mixed_wiki/2.jpg")]
image_documents.extend([ImageDocument(image_path=res_img) for res_img in retrieved_images[1:]])

openai_mm_llm = OpenAIMultiModal(model="gpt-4-vision-preview", api_key="your_openai_api_key", max_new_tokens=1500)
response = openai_mm_llm.complete(
    prompt="Given the first image as the base image, what the other images correspond to?",
    image_documents=image_documents,
)

print(response)

可能遇到的错误

网络问题：在下载维基百科图像时可能会遇到网络连接问题，可以尝试多次重试或者使用VPN。
API密钥问题：在使用OpenAI API时，如果API密钥错误或者失效，会导致请求失败。确保API密钥正确且有效。
内存不足：在处理大量图像时，可能会遇到内存不足的问题，可以尝试优化代码或者增加硬件资源。

如果你觉得这篇文章对你有帮助，请点赞，关注我的博客，谢谢!

参考资料：

OpenAI API

标签：wiki,img,4V,image,LlamaIndex,images,import,GPT,path
From： https://blog.csdn.net/qq_29929123/article/details/140969517

ChatGPT 人工智能助理 Assistant
简介AssistantsAPI允许您在自己的应用程序中构建AI助手。助手通过指令，利用模型、工具和知识来响应用户查询。Assistants主要分为几大模块：类型支持的功能Name助理的名称。Instructions指示，预制的一些提示词，比如角色设定。Model可以指定任何GPT-3.5或GP......
利用chatgpt3.5/4.0生成一个generator，完成杨辉三角
deftriangles():row=[1]whileTrue:yieldrowrow=[sum(x)forxinzip([0]+row,row+[0])]#期待输出:#[1]#[1,1]#[1,2,1]#[1,3,3,1]#[1,4,6,4,1]#[1,5,10,10,5,1]#[1,6,15,20,15,6,1]#[1,7,......
#GPT-4o mini 来袭：开发者如何驾驭新一代AI模型？#
要有效驾驭GPT-4oMini这一新一代AI模型，开发者需要注意以下几个关键点：理解模型特性:GPT-4oMini以高效性和精确性著称，这意味着在项目中使用时，应充分利用其高效率来减少计算时间，同时确保输出的准确度。API集成:开发者需要熟悉OpenAI提供的API文档，学会如何通过API调用来......
ChatGPT教你如何在学术论文中使用间接资料
学境思源，一键生成论文初稿：AcademicIdeas-学境思源AI论文写作引言在学术研究中，引用资料是确保研究基础和论点可信的重要环节。直接引用和间接引用都是常用的引用方法，其中间接引用尤其需要注意其准确性和规范性。今天我们将详细探讨在学术论文中使用间接资料的正确方法和注......
《Advanced RAG》-03-使用 RAGAs + LlamaIndex 进行 RAG 评估
摘要文章首先介绍了RAG评估的三个主要部分：输入查询、检索上下文和LLM生成的响应。提到了RAGAs提出的RAG评估指标，包括Faithfulness、AnswerRelevance和ContextRelevance，以及RAGAs网站提供的两个额外指标：ContextPrecision和ContextRecall。详细解释了每......
40个高阶ChatGPT学术论文指令集（附GPT使用链接）
我精心挑选的40个顶尖ChatGPT学术论文指令集，无疑将成为你撰写论文和开展研究的珍贵资源，极力推荐你珍藏起来！这些建议极具实用价值，能有效提高你的研究工作效率，使得论文撰写过程轻松许多。在开始前，提示词使用建议选择目前最强的模型，不同模型对指令的follow能力有极大的差距，纵......
GPT财务分析教程（附指令）
如何有效地将新技术应用于企业管理成为了各行各业关注的焦点，尤其是在财务分析领域，以ChatGPT为代表的人工智能技术为我们提供了新的思路和工具。在本文中，我们将以Y集团为例，详细探讨如何将ChatGPT与传统的财务报表分析方法相结合，进而提升分析的准确性和效率。我们将介绍使用的工......
ChatGPT写出一篇优质论文攻略指南
学术论文的撰写，是学术研究过程中最为关键的环节之一。它不仅能够系统展示研究成果，同时也是学术交流的重要媒介。本文将深入解析使用ChatGPT撰写论文的各个阶段，从选题的详细讨论到参考文献的精心整理，旨在为研究者提供一份全面的指南，帮助他们在学术领域中留下深刻印记。以下......
问一问神奇的ChatGPT
Q：我们随机问\(k\)个数，取这些数的最大值\(m\)，\(m\)应该不会距离\(\frac{k(k+1)}{n}\)太远的结论为什么成立？A：这个结论来自于统计学中的极值理论。具体来说，当我们从一个已知范围内随机选择一些数，并取这些数中的最大值时，这个最大值有一定的统计规律。假设我们从一个包含(\(......
【arxiv 2024】VideoGPT+: Integrating Image and Video Encoders for Enhanced Video
【arxiv2024】VideoGPT+:IntegratingImageandVideoEncodersforEnhancedVideoUnderstanding一、前言Abstract1Introduction2RelatedWorks3Method4Dataset5ProposedBenchmark6Experiments7Conclusion8QualitativeResults9AdditionalImplementation......