首页 > 其他分享 >Qdrant官方快速入门和教程简化版

Qdrant官方快速入门和教程简化版

时间:2024-08-28 22:38:53浏览次数:8  
标签:教程 vector name 简化版 collection client Qdrant qdrant payload

Qdrant官方快速入门和教程简化版

说明:

关于

阅读Qdrant一小部分的官方文档,并使用中文简化记录下,更多请阅读官方文档。

使用Docker本地部署Qdrant

docker pull qdrant/qdrant
docker run -d -p 6333:6333 -p 6334:6334 \
    -v $(pwd)/qdrant_storage:/qdrant/storage:z \
    qdrant/qdrant

默认配置下,所有的数据存储在./qdrant_storage

快速入门

安装qdrant-client包(python):

pip install qdrant-client

初始化客户端:

from qdrant_client import QdrantClient

client = QdrantClient(url="http://localhost:6333")

所有的向量数据(vector data)都存储在Qdrant Collection上。创建一个名为test_collection的collection,该collection使用dot product作为比较向量的指标。

from qdrant_client.models import Distance, VectorParams

client.create_collection(
    collection_name="test_collection",
    vectors_config=VectorParams(size=4, distance=Distance.DOT),
)

添加带payload的向量。payload是与向量相关联的数据。

from qdrant_client.models import PointStruct

operation_info = client.upsert(
    collection_name="test_collection",
    wait=True,
    points=[
        PointStruct(id=1, vector=[0.05, 0.61, 0.76, 0.74], payload={"city": "Berlin"}),
        PointStruct(id=2, vector=[0.19, 0.81, 0.75, 0.11], payload={"city": "London"}),
        PointStruct(id=3, vector=[0.36, 0.55, 0.47, 0.94], payload={"city": "Moscow"}),
        PointStruct(id=4, vector=[0.18, 0.01, 0.85, 0.80], payload={"city": "New York"}),
        PointStruct(id=5, vector=[0.24, 0.18, 0.22, 0.44], payload={"city": "Beijing"}),
        PointStruct(id=6, vector=[0.35, 0.08, 0.11, 0.44], payload={"city": "Mumbai"}),
    ]
)

print(operation_info)

运行一个查询:

search_result = client.query_points(
    collection_name="test_collection", query=[0.2, 0.1, 0.9, 0.7], limit=3
).points

print(search_result)

输出:

[
  {
    "id": 4,
    "version": 0,
    "score": 1.362,
    "payload": null,
    "vector": null
  },
  {
    "id": 1,
    "version": 0,
    "score": 1.273,
    "payload": null,
    "vector": null
  },
  {
    "id": 3,
    "version": 0,
    "score": 1.208,
    "payload": null,
    "vector": null
  }
]

添加一个过滤器:

from qdrant_client.models import Filter, FieldCondition, MatchValue

search_result = client.query_points(
    collection_name="test_collection",
    query=[0.2, 0.1, 0.9, 0.7],
    query_filter=Filter(
        must=[FieldCondition(key="city", match=MatchValue(value="London"))]
    ),
    with_payload=True,
    limit=3,
).points

print(search_result)

输出:

[
    {
        "id": 2,
        "version": 0,
        "score": 0.871,
        "payload": {
            "city": "London"
        },
        "vector": null
    }
]

教程

语义搜索入门

安装依赖:

pip install sentence-transformers

导入模块:

from qdrant_client import models, QdrantClient
from sentence_transformers import SentenceTransformer

使用all-MiniLM-L6-v2编码器作为embedding模型,embedding模型可以将raw data转化为embeddings)

encoder = SentenceTransformer("all-MiniLM-L6-v2")

添加数据集:

documents = [
    {
        "name": "The Time Machine",
        "description": "A man travels through time and witnesses the evolution of humanity.",
        "author": "H.G. Wells",
        "year": 1895,
    },
    {
        "name": "Ender's Game",
        "description": "A young boy is trained to become a military leader in a war against an alien race.",
        "author": "Orson Scott Card",
        "year": 1985,
    },
    {
        "name": "Brave New World",
        "description": "A dystopian society where people are genetically engineered and conditioned to conform to a strict social hierarchy.",
        "author": "Aldous Huxley",
        "year": 1932,
    },
    {
        "name": "The Hitchhiker's Guide to the Galaxy",
        "description": "A comedic science fiction series following the misadventures of an unwitting human and his alien friend.",
        "author": "Douglas Adams",
        "year": 1979,
    },
    {
        "name": "Dune",
        "description": "A desert planet is the site of political intrigue and power struggles.",
        "author": "Frank Herbert",
        "year": 1965,
    },
    {
        "name": "Foundation",
        "description": "A mathematician develops a science to predict the future of humanity and works to save civilization from collapse.",
        "author": "Isaac Asimov",
        "year": 1951,
    },
    {
        "name": "Snow Crash",
        "description": "A futuristic world where the internet has evolved into a virtual reality metaverse.",
        "author": "Neal Stephenson",
        "year": 1992,
    },
    {
        "name": "Neuromancer",
        "description": "A hacker is hired to pull off a near-impossible hack and gets pulled into a web of intrigue.",
        "author": "William Gibson",
        "year": 1984,
    },
    {
        "name": "The War of the Worlds",
        "description": "A Martian invasion of Earth throws humanity into chaos.",
        "author": "H.G. Wells",
        "year": 1898,
    },
    {
        "name": "The Hunger Games",
        "description": "A dystopian society where teenagers are forced to fight to the death in a televised spectacle.",
        "author": "Suzanne Collins",
        "year": 2008,
    },
    {
        "name": "The Andromeda Strain",
        "description": "A deadly virus from outer space threatens to wipe out humanity.",
        "author": "Michael Crichton",
        "year": 1969,
    },
    {
        "name": "The Left Hand of Darkness",
        "description": "A human ambassador is sent to a planet where the inhabitants are genderless and can change gender at will.",
        "author": "Ursula K. Le Guin",
        "year": 1969,
    },
    {
        "name": "The Three-Body Problem",
        "description": "Humans encounter an alien civilization that lives in a dying system.",
        "author": "Liu Cixin",
        "year": 2008,
    },
]

将embedding数据存储在内存中:

client = QdrantClient(":memory:")

创建一个collection:

client.create_collection(
    collection_name="my_books",
    vectors_config=models.VectorParams(
        size=encoder.get_sentence_embedding_dimension(),  # Vector size is defined by used model
        distance=models.Distance.COSINE,
    ),
)

上传数据:

client.upload_points(
    collection_name="my_books",
    points=[
        models.PointStruct(
            id=idx, vector=encoder.encode(doc["description"]).tolist(), payload=doc
        )
        for idx, doc in enumerate(documents)
    ],
)

问一个问题:

hits = client.query_points(
    collection_name="my_books",
    query=encoder.encode("alien invasion").tolist(),
    limit=3,
).points

for hit in hits:
    print(hit.payload, "score:", hit.score)

输出:

{'name': 'The War of the Worlds', 'description': 'A Martian invasion of Earth throws humanity into chaos.', 'author': 'H.G. Wells', 'year': 1898} score: 0.570093257022374
{'name': "The Hitchhiker's Guide to the Galaxy", 'description': 'A comedic science fiction series following the misadventures of an unwitting human and his alien friend.', 'author': 'Douglas Adams', 'year': 1979} score: 0.5040468703143637
{'name': 'The Three-Body Problem', 'description': 'Humans encounter an alien civilization that lives in a dying system.', 'author': 'Liu Cixin', 'year': 2008} score: 0.45902943411768216

过滤以便缩窄查询:

hits = client.query_points(
    collection_name="my_books",
    query=encoder.encode("alien invasion").tolist(),
    query_filter=models.Filter(
        must=[models.FieldCondition(key="year", range=models.Range(gte=2000))]
    ),
    limit=1,
).points

for hit in hits:
    print(hit.payload, "score:", hit.score)

输出:

{'name': 'The Three-Body Problem', 'description': 'Humans encounter an alien civilization that lives in a dying system.', 'author': 'Liu Cixin', 'year': 2008} score: 0.45902943411768216

简单的神经搜索

下载样本数据集:

wget https://storage.googleapis.com/generall-shared-data/startups_demo.json

安装SentenceTransformer等依赖库:

pip install sentence-transformers numpy pandas tqdm

导入模块:

from sentence_transformers import SentenceTransformer
import numpy as np
import json
import pandas as pd
from tqdm.notebook import tqdm

创建sentence encoder:

model = SentenceTransformer(
    "all-MiniLM-L6-v2", device="cuda"
)  # or device="cpu" if you don't have a GPU

读取数据:

df = pd.read_json("./startups_demo.json", lines=True)

为每一个description创建embedding向量。encode内部会将输入切分为一个个batch,以便提高处理速度。

vectors = model.encode(
    [row.alt + ". " + row.description for row in df.itertuples()],
    show_progress_bar=True,
)
vectors.shape
# > (40474, 384)

保存为npy文件:

np.save("startup_vectors.npy", vectors, allow_pickle=False)

启动docker服务

docker pull qdrant/qdrant
docker run -p 6333:6333 \
    -v $(pwd)/qdrant_storage:/qdrant/storage \
    qdrant/qdrant

创建Qdrant客户端

# Import client library
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance

client = QdrantClient("http://localhost:6333")

创建collection,其中384是embedding模型(all-MiniLM-L6-v2)的输出维度。

if not client.collection_exists("startups"):
    client.create_collection(
        collection_name="startups",
        vectors_config=VectorParams(size=384, distance=Distance.COSINE),
    )

加载数据

fd = open("./startups_demo.json")

# payload is now an iterator over startup data
payload = map(json.loads, fd)

# Load all vectors into memory, numpy array works as iterable for itself.
# Other option would be to use Mmap, if you don't want to load all data into RAM
vectors = np.load("./startup_vectors.npy")

上传数据到Qdrant

client.upload_collection(
    collection_name="startups",
    vectors=vectors,
    payload=payload,
    ids=None,  # Vector ids will be assigned automatically
    batch_size=256,  # How many vectors will be uploaded in a single request?
)

创建neural_searcher.py文件:

from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer


class NeuralSearcher:
    def __init__(self, collection_name):
        self.collection_name = collection_name
        # Initialize encoder model
        self.model = SentenceTransformer("all-MiniLM-L6-v2", device="cpu")
        # initializa Qdrant client
        self.qdrant_client = QdrantClient("http://localhost:6333")
    
    def search(self, text:str):
        # Convert text query into vector
        vector = self.model.encode(text).tolist()
        
        # Use `vector` for search for closet vectors in the collection
        search_result = self.qdrant_client.search(
            collection_name=self.collection_name,
            query_vector=vector,
            query_filter=None, # If you don't want any filters for now
            limit=5, # 5 the most closet results is enough
        )
        # `search_result` contains found vector ids with similarity scores along with stored payload
        # In this function you are interested in payload only
        payloads = [hit.payload for hit in search_result]
        return payloads

使用FastAPI部署:

pip install fastapi uvicorn
from qdrant_client import QdrantClient
from qdrant_client.models import Filter
from sentence_transformers import SentenceTransformer


class NeuralSearcher:
    def __init__(self, collection_name):
        self.collection_name = collection_name
        # Initialize encoder model
        self.model = SentenceTransformer("all-MiniLM-L6-v2", device="cpu")
        # initializa Qdrant client
        self.qdrant_client = QdrantClient("http://localhost:6333")
    
    def search(self, text:str):
        # Convert text query into vector
        vector = self.model.encode(text).tolist()
        
        # Use `vector` for search for closet vectors in the collection
        search_result = self.qdrant_client.search(
            collection_name=self.collection_name,
            query_vector=vector,
            query_filter=None, # If you don't want any filters for now
            limit=5, # 5 the most closet results is enough
        )
        # `search_result` contains found vector ids with similarity scores along with stored payload
        # In this function you are interested in payload only
        payloads = [hit.payload for hit in search_result]
        return payloads
    
    def search_in_berlin(self, text:str):
        # Convert text query into vector
        vector = self.model.encode(text).tolist()
        
        city_of_interest = "Berlin"
        
        # Define a filter for cities
        city_filter = Filter(**{
            "must": [{
                "key": "city", # Store city information in a field of the same name 
                "match": { # This condition checks if payload field has the requested value
                    "value": city_of_interest
                }
            }]
        })
        
        # Use `vector` for search for closet vectors in the collection
        search_result = self.qdrant_client.query_points(
            collection_name=self.collection_name,
            query=vector,
            query_filter=city_filter,
            limit=5,
        ).points
        # `search_result` contains found vector ids with similarity scores along with stored payload
        # In this function you are interested in payload only
        payloads = [hit.payload for hit in search_result]
        return payloads
from fastapi import FastAPI

app = FastAPI()

# Create a neural searcher instance
neural_searcher = NeuralSearcher(collection_name="startups")


@app.get("/api/search")
def search_startup(q: str):
    return {"result": neural_searcher.search(text=q)}

@app.get("/api/search_in_berlin")
def search_startup_filter(q: str):
    return {"result": neural_searcher.search_in_berlin(text=q)}

if __name__ == "__main__":
    import uvicorn
    
    uvicorn.run(app, host="0.0.0.0", port=8001)

如果是在jupyter notebook中运行,则需要添加

import nest_asyncio
nest_asyncio.apply()

安装nest_asyncio:

pip install nest_asyncio

异步使用Qdrant

Qdrant原生支持async

from qdrant_client import models

import qdrant_client
import asyncio


async def main():
    client = qdrant_client.AsyncQdrantClient("localhost")

    # Create a collection
    await client.create_collection(
        collection_name="my_collection",
        vectors_config=models.VectorParams(size=4, distance=models.Distance.COSINE),
    )

    # Insert a vector
    await client.upsert(
        collection_name="my_collection",
        points=[
            models.PointStruct(
                id="5c56c793-69f3-4fbf-87e6-c4bf54c28c26",
                payload={
                    "color": "red",
                },
                vector=[0.9, 0.1, 0.1, 0.5],
            ),
        ],
    )

    # Search for nearest neighbors
    points = await client.query_points(
        collection_name="my_collection",
        query=[0.9, 0.1, 0.1, 0.5],
        limit=2,
    ).points

    # Your async code using AsyncQdrantClient might be put here
    # ...


asyncio.run(main())

标签:教程,vector,name,简化版,collection,client,Qdrant,qdrant,payload
From: https://www.cnblogs.com/shizidushu/p/18385637

相关文章

  • WPF 现代化开发教程:使用 Microsoft.Extensions.Hosting 和 CommunityToolkit.Mvvm
    介绍随着WPF应用程序的复杂性增加,使用现代化的开发工具和模式变得尤为重要。本教程将指导你如何使用Microsoft.Extensions.Hosting和CommunityToolkit.Mvvm来开发一个现代化的WPF应用程序。这些工具为开发者提供了依赖注入、应用程序生命周期管理、MVVM模式支持等功能。先决......
  • 设计必备Adobe Photoshop ps2023 最新版本v24.5 安装和下载教程
    从Photoshop开始,一场惊艳的视觉盛宴就此展开。无论是社交媒体帖子还是照片修饰,设计横幅还是精美网站,日常影像编辑还是重塑现实,Photoshop都能让创作更上一层楼。Photoshop24.5带来了全新的体验,让你的创作之旅更加轻松愉快。调整预设,为你的图像增添戏剧效果。这个功能让你只......
  • Lightroom 2023软件下载LCR2023最新版下载安装教程分享
    Lightroom是Adobe公司推出的专业图像管理和非破坏性编辑软件。它具有便捷的图像浏览、快速的原始格式支持、强大的批量编辑能力等优点。AdobeLightroom是一款图像处理软件,主要针对摄影师的照片管理和后期需求。Lightroom提供了专业级别的硬盘照片分类系统,可以自定义分类......
  • Adobe Illustrator 矢量图形制作AI2023 安装教程+安装包下载!
    在AdobeIllustrator中,用户可以使用各种矢量图形工具来绘制和编辑图形。这些工具包括形状工具、线条工具、椭圆工具、多边形工具等等。此外,AdobeIllustrator还提供了丰富的颜色和笔刷选项,使用户能够轻松地创建出各种风格独特的作品。除了绘图和编辑功能外,AdobeIllustrator......
  • After Effects中文软件AE2023最新版官方版本下载安装教程
    AfterEffects 是美国Adobe公司出品的一款图形视频处理软件,主要用于中间片的视频特效、图形动画和复合。它是许多电影特效制作人员必不可少的软件之一。AfterEffects 的主要功能有:1. 精确的图层控制系统 - 可以通过图层顺畅地进行多轨道合成。2. 强大的键帧动......
  • Adams2021软件中文版Adams2020下载安装教程全版本
    Adams是一款功能强大的多体系统动力学仿真软件,主要具有以下特点:1.精确的刚体动力学求解器,可以对机械系统的运动进行高保真仿真。2.内置丰富的关节运动学库,大大简化了机构建模过程。3.强大的控制系统建模功能,可以对机电一体化系统进行仿真。4.先进的仿真后处理工......
  • IDM下载器免费使用教程(不弹窗在线更新)
    InternetDownloadManager(简称IDM)是一个用于Windows系统的下载管理器。应该是我用过最强大的下载器,IDM可以让用户自动下载某些类型的文件,可将文件划分为多个下载点以更快下载,支持批量下载,支持多种协议。接下来分享一个告别IDM弹窗,免费使用IDM下载器还在线更新的方法。一、......
  • ASP.NET Core 入门教程三 结合 EFCore 和 SQLite
    ASP.NETCore是一个开源的Web框架,它允许开发者轻松地构建现代、高性能的Web应用程序。EntityFrameworkCore(EFCore)是一个轻量级、可扩展的ORM(对象关系映射)框架,它支持多种数据库。SQLite是一个轻量级的嵌入式数据库,适用于小型应用程序。在本篇文章中,我们将学习如何......
  • 【ROS教程】安装ROS全流程及可能遇到的问题
    @目录1.配置Softerware&Updates2.添加软件源3.设置key4.更新并安装4.1更新4.2安装(ros-noetic-desktop-full)4.2.1安装aptitude4.2.2安装ROS软件包5.添加环境变量6.安装构建依赖7.初始化和更新7.1初始化7.1.1目前可行的解决办法:重新定位资源7.1.2结果7.2更新1.配置Sof......
  • Stable Audio文本转音乐 免费商用无版权限制 本地一键包使用教程
    StabilityAI的发布再次打破了创新的界限。这款前沿模型在其前作的成功基础上,增添了一系列突破性功能,可能彻底改变艺术家和音乐家创建及操作音频内容的方式。StableAudio2.0标志着人工智能生成音频的一个重要里程碑,为音质、多功能性和创意潜力设定了全新标准。这个模型......