在本文中,我们将演示如何使用LlamaIndex结合GPT-4V和CLIP来实现图像到图像的检索。该过程包括从维基百科下载图像和文本,构建多模态索引,利用GPT-4V进行图像相关性推理,并展示检索结果。
步骤
1. 安装所需的库
%pip install llama-index-multi-modal-llms-openai
%pip install llama-index-vector-stores-qdrant
%pip install llama_index ftfy regex tqdm
%pip install git+https://github.com/openai/CLIP.git
%pip install torch torchvision
%pip install matplotlib scikit-image
%pip install -U qdrant_client
2. 下载维基百科的图像和文本
import os
import wikipedia
import urllib.request
from pathlib import Path
image_path = Path("mixed_wiki")
image_uuid = 0
image_metadata_dict = {}
MAX_IMAGES_PER_WIKI = 30
wiki_titles = [
"Vincent van Gogh",
"San Francisco",
"Batman",
"iPhone",
"Tesla Model S",
"BTS band",
]
if not image_path.exists():
Path.mkdir(image_path)
for title in wiki_titles:
images_per_wiki = 0
print(title)
try:
page_py = wikipedia.page(title)
list_img_urls = page_py.images
for url in list_img_urls:
if url.endswith(".jpg") or url.endswith(".png"):
image_uuid += 1
image_file_name = title + "_" + url.split("/")[-1]
image_metadata_dict[image_uuid] = {
"filename": image_file_name,
"img_path": "./" + str(image_path / f"{image_uuid}.jpg"),
}
urllib.request.urlretrieve(
url, image_path / f"{image_uuid}.jpg"
)
images_per_wiki += 1
if images_per_wiki > MAX_IMAGES_PER_WIKI:
break
except Exception as e:
print(f"Exception: No images found for Wikipedia page: {title}")
continue
3. 展示下载的图像
from PIL import Image
import matplotlib.pyplot as plt
import os
image_paths = []
for img_path in os.listdir("./mixed_wiki"):
image_paths.append(str(os.path.join("./mixed_wiki", img_path)))
def plot_images(image_paths):
images_shown = 0
plt.figure(figsize=(16, 9))
for img_path in image_paths:
if os.path.isfile(img_path):
image = Image.open(img_path)
plt.subplot(3, 3, images_shown + 1)
plt.imshow(image)
plt.xticks([])
plt.yticks([])
images_shown += 1
if images_shown >= 9:
break
plot_images(image_paths)
4. 构建多模态索引和向量存储
from llama_index.core.indices import MultiModalVectorStoreIndex
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import SimpleDirectoryReader, StorageContext
import qdrant_client
client = qdrant_client.QdrantClient(path="qdrant_img_db")
text_store = QdrantVectorStore(client=client, collection_name="text_collection")
image_store = QdrantVectorStore(client=client, collection_name="image_collection")
storage_context = StorageContext.from_defaults(vector_store=text_store, image_store=image_store)
documents = SimpleDirectoryReader("./mixed_wiki/").load_data()
index = MultiModalVectorStoreIndex.from_documents(documents, storage_context=storage_context)
5. 检索相关图像并进行推理
retriever_engine = index.as_retriever(image_similarity_top_k=4)
retrieval_results = retriever_engine.image_to_image_retrieve("./mixed_wiki/2.jpg")
retrieved_images = [res.node.metadata["file_path"] for res in retrieval_results]
plot_images(retrieved_images[1:])
from llama_index.multi_modal_llms.openai import OpenAIMultiModal
from llama_index.core.schema import ImageDocument
image_documents = [ImageDocument(image_path="./mixed_wiki/2.jpg")]
image_documents.extend([ImageDocument(image_path=res_img) for res_img in retrieved_images[1:]])
openai_mm_llm = OpenAIMultiModal(model="gpt-4-vision-preview", api_key="your_openai_api_key", max_new_tokens=1500)
response = openai_mm_llm.complete(
prompt="Given the first image as the base image, what the other images correspond to?",
image_documents=image_documents,
)
print(response)
可能遇到的错误
- 网络问题:在下载维基百科图像时可能会遇到网络连接问题,可以尝试多次重试或者使用VPN。
- API密钥问题:在使用OpenAI API时,如果API密钥错误或者失效,会导致请求失败。确保API密钥正确且有效。
- 内存不足:在处理大量图像时,可能会遇到内存不足的问题,可以尝试优化代码或者增加硬件资源。
如果你觉得这篇文章对你有帮助,请点赞,关注我的博客,谢谢!
参考资料:
标签:wiki,img,4V,image,LlamaIndex,images,import,GPT,path From: https://blog.csdn.net/qq_29929123/article/details/140969517