首页 > 其他分享 >AI - 谈谈RAG中的查询分析

AI - 谈谈RAG中的查询分析

时间:2024-12-07 09:32:42浏览次数:5  
标签:RAG code AI 查询 tokens will file query

AI - 谈谈RAG中的查询分析

大家好,今天我们来聊聊RAG(Retrieval-Augmented Generation)中的一个重要环节——查询分析(Query Analysis)。

ai-langchain-rag

什么是查询分析

查询分析,说简单点,就是理解用户在问什么。在RAG系统中,用户输入一个查询,我们的任务是通过一些技术手段弄明白这个查询的真正意图。这个过程可以包括很多步骤,比如分词、实体识别、情感分析、上下文理解等等。就好比你走进一家图书馆,告诉馆员你在找一本关于古代历史的书,馆员会通过你提供的信息,给你推荐相关的书籍。查询分析就是这位“馆员”理解用户需求的那一部分。

为什么要进行查询分析

说到为啥要做查询分析,其实目的非常明确——提高检索效果。你想想看,如果我们能准确地理解用户的查询意图,就能更精准地找到用户需要的内容。特别是在RAG系统中,查询分析能帮助我们大大提高检索阶段的效率和准确性。比如,有人问“李白的诗”,如果我们能进行有效的查询分析,就能知道用户是在找唐代诗人李白的诗,而不是现代某个叫李白的人的作品。

查询分析和检索之间的关系

查询分析和检索,关系非常密切,可以说是密不可分的两部分。查询分析的结果直接影响到后续的检索效果。如何理解呢?就好比你需要找一本书,查询分析就相当于你告诉馆员你要找什么样的书,检索就是馆员根据你的描述去找书。如果描述清晰准确,馆员找书的效率就会很高,反之亦然。

在RAG系统中,首先要通过查询分析明确用户的需求,然后才能执行下一步的检索操作。可以说,查询分析是检索的前提条件,没有好的查询分析,检索也很难做到精准。

代码示例

下面,我们来一步一步写一段与查询分析相关的代码,并对其背后的概念进行深入讲解。备注:对于本文中的代码片段,主体来源于LangChain官网,有兴趣的读者可以去官网查看。

引入必要的库

import os
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from langchain_core.vectorstores import InMemoryVectorStore
import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict

from typing import Literal
from typing_extensions import Annotated

首先引入了很多必要的库,涵盖了从大语言模型(LLM)、向量存储、网页加载、文本切分到状态图构建等多个方面,为后续的查询分析功能提供了坚实的基础。

配置环境变量和初始化嵌入模型

os.environ["OPENAI_API_KEY"] = 'api-key'

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vector_store = InMemoryVectorStore(embeddings)

llm = ChatOpenAI(model="gpt-4o-mini")

这里配置了 OpenAI API key,并初始化了一些关键对象:

  • OpenAIEmbeddings 创建了一个嵌入模型,用于将文档表示为向量。
  • InMemoryVectorStore 创建了一个内存向量存储,用于存储和检索向量表示的文档。
  • ChatOpenAI 创建了一个小型的 LLM 模型,用于后续的查询分析和生成。

加载和切分博客内容

loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

total_documents = len(all_splits)
third = total_documents // 3

for i, document in enumerate(all_splits):
    if i < third:
        document.metadata["section"] = "beginning"
    elif i < 2 * third:
        document.metadata["section"] = "middle"
    else:
        document.metadata["section"] = "end"

all_splits[0].metadata

这部分代码负责从指定的 URL 加载内容,并将其切分成小块(chunk)。具体步骤如下:

  1. 使用 WebBaseLoader 加载网页内容,并通过 bs4.SoupStrainer 提取特定的 HTML 元素。
  2. 使用 RecursiveCharacterTextSplitter 对文档进行切分,设置每个块的大小为 1000 字符,重叠为 200 字符。
  3. 按照文档的总数量,将文档分别标记为 “beginning”、“middle” 和 “end” 三部分,便于后续查询时按段检索。

添加文档到向量存储

_ = vector_store.add_documents(documents=all_splits)

这一行代码将已经切分并标记好的文档添加到向量存储中,这样可以利用向量表示进行高效的相似度搜索。

定义查询及应用状态

prompt = hub.pull("rlm/rag-prompt")

class Search(TypedDict):
    """Search query."""
    query: Annotated[str, ..., "Search query to run."]
    section: Annotated[
        Literal["beginning", "middle", "end"],
        ...,
        "Section to query.",
    ]

class State(TypedDict):
    question: str
    query: Search
    context: List[Document]
    answer: str

这里定义了查询和状态的类型:

  • Search 用于描述查询,包括查询字符串(query)和要查询的段落(section)。
  • State 描述了整个应用的状态,包括用户的问题、查询、上下文文档以及生成的答案。

查询分析函数

def analyze_query(state: State):
    structured_llm = llm.with_structured_output(Search)
    query = structured_llm.invoke(state["question"])
    return {"query": query}

analyze_query 函数通过 LLM 分析用户的问题,并结构化输出一个 Search 对象,即包括查询字符串和段落信息。

检索函数

def retrieve(state: State):
    query = state["query"]
    retrieved_docs = vector_store.similarity_search(
        query["query"],
        filter=lambda doc: doc.metadata.get("section") == query["section"],
    )
    return {"context": retrieved_docs}

retrieve 函数从向量存储中检索与查询字符串相似的文档,并根据段落信息进行过滤。

生成答案

def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}

generate 函数将检索到的文档内容结合用户的问题,利用 LLM 生成答案。

构建状态图和执行步骤

graph_builder = StateGraph(State).add_sequence([analyze_query, retrieve, generate])
graph_builder.add_edge(START, "analyze_query")
graph = graph_builder.compile()

for step in graph.stream(
    {"question": "What does the end of the post say about Task Decomposition?"},
    stream_mode="updates",
):
    print(f"{step}\n\n----------------\n")

最后,这部分代码创建了一个状态图,将 analyze_queryretrievegenerate 函数按顺序链接起来,然后执行整个查询分析和生成过程。

LLM消息抓取

以上整个过程中,我们都是在调用LangChain API与LLM在进行交互,至于底层发送的请求细节,一无所知。在某些场景下面,我们还是需要去探究一下这些具体的细节,这样可以有一个全面的了解。下面我们看一下具体的发送内容,以上代码涉及到两个LLM交互。

交互1

LLM-Call1

LLM提问

{
  "messages": [
    [
      {
        "lc": 1,
        "type": "constructor",
        "id": [
          "langchain",
          "schema",
          "messages",
          "HumanMessage"
        ],
        "kwargs": {
          "content": "What does the end of the post say about Task Decomposition?",
          "type": "human"
        }
      }
    ]
  ]
}

LLM回答

[
  {
    "name": "Search",
    "args": {
      "query": "Task Decomposition",
      "section": "end"
    },
    "id": "call_5KSfkBGRge6G95IKL84n0mck",
    "type": "tool_call"
  }
]

交互2

LLM-Call2

LLM提问

{
  "messages": [
    [
      {
        "lc": 1,
        "type": "constructor",
        "id": [
          "langchain",
          "schema",
          "messages",
          "HumanMessage"
        ],
        "kwargs": {
          "content": "You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: What does the end of the post say about Task Decomposition? \nContext: You will get instructions for code to write.\nYou will write a very long answer. Make sure that every detail of the architecture is, in the end, implemented as code.\nMake sure that every detail of the architecture is, in the end, implemented as code.\nThink step by step and reason yourself to the right decisions to make sure we get it right.\nYou will first lay out the names of the core classes, functions, methods that will be necessary, as well as a quick comment on their purpose.\nThen you will output the content of each file including ALL code.\nEach file must strictly follow a markdown code block format, where the following tokens must be replaced such that\nFILENAME is the lowercase file name including the file extension,\nLANG is the markup code block language for the code’s language, and CODE is the code:\nFILENAME\nCODE\nYou will start with the “entrypoint” file, then go to the ones that are imported by that file, and so on.\n\n\"content\": \"Please now remember the steps:\\n\\nThink step by step and reason yourself to the right decisions to make sure we get it right.\\nFirst lay out the names of the core classes, functions, methods that will be necessary, As well as a quick comment on their purpose.\\n\\nThen you will output the content of each file including ALL code.\\nEach file must strictly follow a markdown code block format, where the following tokens must be replaced such that\\nFILENAME is the lowercase file name including the file extension,\\nLANG is the markup code block language for the code's language, and CODE is the code:\\n\\nFILENAME\\n```LANG\\nCODE\\n```\\n\\nPlease note that the code should be fully functional. No placeholders.\\n\\nYou will start with the \\\"entrypoint\\\" file, then go to the ones that are imported by that file, and so on.\\nFollow a language and framework appropriate best practice file naming convention.\\nMake sure that files contain all imports, types etc. The code should be fully\n\n\"content\": \"You will get instructions for code to write.\\nYou will write a very long answer. Make sure that every detail of the architecture is, in the end, implemented as code.\\nMake sure that every detail of the architecture is, in the end, implemented as code.\\n\\nThink step by step and reason yourself to the right decisions to make sure we get it right.\\nYou will first lay out the names of the core classes, functions, methods that will be necessary, as well as a quick comment on their purpose.\\n\\nThen you will output the content of each file including ALL code.\\nEach file must strictly follow a markdown code block format, where the following tokens must be replaced such that\\nFILENAME is the lowercase file name including the file extension,\\nLANG is the markup code block language for the code's language, and CODE is the code:\\n\\nFILENAME\\n```LANG\\nCODE\\n```\\n\\nYou will start with the \\\"entrypoint\\\" file, then go to the ones that are imported by that file, and so on.\\nPlease\n\nFILENAME\nCODE\nYou will start with the “entrypoint” file, then go to the ones that are imported by that file, and so on.\nPlease note that the code should be fully functional. No placeholders.\nFollow a language and framework appropriate best practice file naming convention.\nMake sure that files contain all imports, types etc. Make sure that code in different files are compatible with each other.\nEnsure to implement all code, if you are unsure, write a plausible implementation.\nInclude module dependency or package manager dependency definition file.\nBefore you finish, double check that all parts of the architecture is present in the files.\nUseful to know:\nYou almost always put different classes in different files.\nFor Python, you always create an appropriate requirements.txt file.\nFor NodeJS, you always create an appropriate package.json file.\nYou always add a comment briefly describing the purpose of the function definition.\nYou try to add comments explaining very complex bits of logic. \nAnswer:",
          "type": "human"
        }
      }
    ]
  ]
}

LLM回答

{
  "generations": [
    [
      {
        "text": "The end of the post emphasizes the importance of task decomposition in programming by outlining a structured approach to coding. It suggests that one should think step by step, starting with the identification of core classes and functions before implementing the code. Additionally, it highlights the necessity of ensuring that all parts of the architecture are present and that the code is fully functional with no placeholders.",
        "generation_info": {
          "finish_reason": "stop",
          "logprobs": null
        },
        "type": "ChatGeneration",
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain",
            "schema",
            "messages",
            "AIMessage"
          ],
          "kwargs": {
            "content": "The end of the post emphasizes the importance of task decomposition in programming by outlining a structured approach to coding. It suggests that one should think step by step, starting with the identification of core classes and functions before implementing the code. Additionally, it highlights the necessity of ensuring that all parts of the architecture are present and that the code is fully functional with no placeholders.",
            "additional_kwargs": {
              "refusal": null
            },
            "response_metadata": {
              "token_usage": {
                "completion_tokens": 72,
                "prompt_tokens": 903,
                "total_tokens": 975,
                "completion_tokens_details": {
                  "accepted_prediction_tokens": 0,
                  "audio_tokens": 0,
                  "reasoning_tokens": 0,
                  "rejected_prediction_tokens": 0
                },
                "prompt_tokens_details": {
                  "audio_tokens": 0,
                  "cached_tokens": 0
                }
              },
              "model_name": "gpt-4o-mini-2024-07-18",
              "system_fingerprint": "fp_3de1288069",
              "finish_reason": "stop",
              "logprobs": null
            },
            "type": "ai",
            "id": "run-7594f119-4edc-460c-870a-c0882946256a-0",
            "usage_metadata": {
              "input_tokens": 903,
              "output_tokens": 72,
              "total_tokens": 975,
              "input_token_details": {
                "audio": 0,
                "cache_read": 0
              },
              "output_token_details": {
                "audio": 0,
                "reasoning": 0
              }
            },
            "tool_calls": [],
            "invalid_tool_calls": []
          }
        }
      }
    ]
  ],
  "llm_output": {
    "token_usage": {
      "completion_tokens": 72,
      "prompt_tokens": 903,
      "total_tokens": 975,
      "completion_tokens_details": {
        "accepted_prediction_tokens": 0,
        "audio_tokens": 0,
        "reasoning_tokens": 0,
        "rejected_prediction_tokens": 0
      },
      "prompt_tokens_details": {
        "audio_tokens": 0,
        "cached_tokens": 0
      }
    },
    "model_name": "gpt-4o-mini-2024-07-18",
    "system_fingerprint": "fp_3de1288069"
  },
  "run": null,
  "type": "LLMResult"
}

总结

ai-quotes-elon-musk

本文讲述了如何实现了一个基于 RAG 的查询分析流程,主要包括三部分:

  1. 查询分析:理解用户的问题,并将其结构化。
  2. 检索:从向量存储中检索相似的文档,并根据段落信息进行过滤。
  3. 生成:通过 LLM 生成答案。

通过这种方式,可以高效地理解用户查询,检索相关信息,并给出精确的回答。希望这些注释和详解能帮助你更好地理解这段代码及其背后的原理。

标签:RAG,code,AI,查询,tokens,will,file,query
From: https://blog.csdn.net/2404_88048702/article/details/144177332

相关文章

  • CodeBERT: A Pre-Trained Model for Programming and Natural Languages
    本次介绍的论文是《CodeBERT:APre-TrainedModelforProgrammingandNaturalLanguages》原文链接:http://www.semanticscholar.org/paper/0fe2636446cd686830da3d971b31a004d6094b3c源代码和数据集:GitHub-microsoft/CodeBERT:CodeBERT本篇论文主要是介绍了CodeBERT......
  • 高效处理“无查询生成”情况:构建更智能的数据检索链
    #高效处理“无查询生成”情况:构建更智能的数据检索链在数据检索系统中,常常需要根据用户输入的问题生成查询。然而,有时我们的查询分析技术可能不会生成任何查询。在这种情况下,我们需要设计一个机制,能够在决定是否调用检索器前,先检查查询分析的结果。本文将通过一个实例介......
  • 探索Google生成式AI嵌入服务:实现高效文本相似度计算
    引言在当今的AI驱动环境中,文本嵌入技术是一项重要工具,帮助我们将文本数据转换为易于计算机处理的向量格式。这种技术可用于多种任务,包括文本分类、相似度计算、信息检索等。本文将介绍如何通过langchain-google-genai包连接Google生成式AI嵌入服务,并运用这些嵌入向量解决实......
  • [优化你的AI应用:使用CerebriumAI实现高效的LLM模型访问]
    #引言随着人工智能的飞速发展,特别是大规模语言模型(LLM)的广泛应用,开发者们迫切需要一种高效、灵活的基础设施来支持这些模型的使用。CerebriumAI作为一项无服务器GPU基础设施服务,通过提供对多种LLM模型的API访问,极大地简化了这一过程。本文将带您了解如何利用CerebriumAI......
  • Ant Design X:卓越的AI界面解决方案
    AntDesignX:卓越的AI界面解决方案​​AntDesignX是AntDesign的全新AGI组件库,旨在帮助开发者更轻松地研发AI产品用户界面。AntDesignX将在AntDesign的基础之上进一步拓展AI产品的设计规范,为开发者提供更强大的工具和资源。期待与你一起推动AI技术的发展!官......
  • Ant Design X:卓越的AI界面解决方案
    AntDesignX:卓越的AI界面解决方案​​AntDesignX是AntDesign的全新AGI组件库,旨在帮助开发者更轻松地研发AI产品用户界面。AntDesignX将在AntDesign的基础之上进一步拓展AI产品的设计规范,为开发者提供更强大的工具和资源。期待与你一起推动AI技术的发展!官......
  • 减少30%人工处理时间,AI OCR与表格识别助力医疗化验单快速处理
    在医疗行业,化验单作为重要的诊断依据和数据来源,涉及大量的文字和表格信息,传统的手工输入和数据处理方式不仅繁琐,而且容易出错,给医院的运营效率和数据准确性带来较大挑战。随着人工智能技术的快速发展,OCR(光学字符识别)与表格识别技术的应用,为医疗行业提供了高效的解决方案。思通数科......
  • 打造智能翻译助手:ChatMistralAI快速入门
    提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档文章目录前言一、pandas是什么?二、使用步骤1.引入库2.读入数据总结前言提示:这里可以添加本文要记录的大概内容:例如:随着人工智能的不断发展,机器学习这门技术也越来越重要,很多人都开启了学习机器学习......
  • 什么是 Kata Containers?
    什么是KataContainers?KataContainers是一种结合了容器技术和虚拟机技术的轻量级运行时,旨在提供容器的速度和虚拟机的安全性。它将容器运行在一个隔离的虚拟机中,从而大幅提升安全性,同时保持容器的高效性。KataContainers的前身是两个项目:ClearContainers和runV。......
  • Ant Design X:卓越的AI界面解决方案
    AntDesignX:卓越的AI界面解决方案​​AntDesignX是AntDesign的全新AGI组件库,旨在帮助开发者更轻松地研发AI产品用户界面。AntDesignX将在AntDesign的基础之上进一步拓展AI产品的设计规范,为开发者提供更强大的工具和资源。期待与你一起推动AI技术的发展!官......