AI - 谈谈RAG中的查询分析

标签：RAG code AI 查询 tokens will file query

AI - 谈谈RAG中的查询分析

大家好，今天我们来聊聊RAG（Retrieval-Augmented Generation）中的一个重要环节——查询分析（Query Analysis）。

ai-langchain-rag

什么是查询分析

查询分析，说简单点，就是理解用户在问什么。在RAG系统中，用户输入一个查询，我们的任务是通过一些技术手段弄明白这个查询的真正意图。这个过程可以包括很多步骤，比如分词、实体识别、情感分析、上下文理解等等。就好比你走进一家图书馆，告诉馆员你在找一本关于古代历史的书，馆员会通过你提供的信息，给你推荐相关的书籍。查询分析就是这位“馆员”理解用户需求的那一部分。

为什么要进行查询分析

说到为啥要做查询分析，其实目的非常明确——提高检索效果。你想想看，如果我们能准确地理解用户的查询意图，就能更精准地找到用户需要的内容。特别是在RAG系统中，查询分析能帮助我们大大提高检索阶段的效率和准确性。比如，有人问“李白的诗”，如果我们能进行有效的查询分析，就能知道用户是在找唐代诗人李白的诗，而不是现代某个叫李白的人的作品。

查询分析和检索之间的关系

查询分析和检索，关系非常密切，可以说是密不可分的两部分。查询分析的结果直接影响到后续的检索效果。如何理解呢？就好比你需要找一本书，查询分析就相当于你告诉馆员你要找什么样的书，检索就是馆员根据你的描述去找书。如果描述清晰准确，馆员找书的效率就会很高，反之亦然。

在RAG系统中，首先要通过查询分析明确用户的需求，然后才能执行下一步的检索操作。可以说，查询分析是检索的前提条件，没有好的查询分析，检索也很难做到精准。

代码示例

下面，我们来一步一步写一段与查询分析相关的代码，并对其背后的概念进行深入讲解。备注：对于本文中的代码片段，主体来源于LangChain官网，有兴趣的读者可以去官网查看。

引入必要的库

import os
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from langchain_core.vectorstores import InMemoryVectorStore
import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict

from typing import Literal
from typing_extensions import Annotated

首先引入了很多必要的库，涵盖了从大语言模型（LLM）、向量存储、网页加载、文本切分到状态图构建等多个方面，为后续的查询分析功能提供了坚实的基础。

配置环境变量和初始化嵌入模型

os.environ["OPENAI_API_KEY"] = 'api-key'

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
vector_store = InMemoryVectorStore(embeddings)

llm = ChatOpenAI(model="gpt-4o-mini")

这里配置了 OpenAI API key，并初始化了一些关键对象：

OpenAIEmbeddings 创建了一个嵌入模型，用于将文档表示为向量。
InMemoryVectorStore 创建了一个内存向量存储，用于存储和检索向量表示的文档。
ChatOpenAI 创建了一个小型的 LLM 模型，用于后续的查询分析和生成。

加载和切分博客内容

loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

total_documents = len(all_splits)
third = total_documents // 3

for i, document in enumerate(all_splits):
    if i < third:
        document.metadata["section"] = "beginning"
    elif i < 2 * third:
        document.metadata["section"] = "middle"
    else:
        document.metadata["section"] = "end"

all_splits[0].metadata

这部分代码负责从指定的 URL 加载内容，并将其切分成小块（chunk）。具体步骤如下：

使用 WebBaseLoader 加载网页内容，并通过 bs4.SoupStrainer 提取特定的 HTML 元素。
使用 RecursiveCharacterTextSplitter 对文档进行切分，设置每个块的大小为 1000 字符，重叠为 200 字符。
按照文档的总数量，将文档分别标记为 “beginning”、“middle” 和 “end” 三部分，便于后续查询时按段检索。

添加文档到向量存储

_ = vector_store.add_documents(documents=all_splits)

这一行代码将已经切分并标记好的文档添加到向量存储中，这样可以利用向量表示进行高效的相似度搜索。

定义查询及应用状态

prompt = hub.pull("rlm/rag-prompt")

class Search(TypedDict):
    """Search query."""
    query: Annotated[str, ..., "Search query to run."]
    section: Annotated[
        Literal["beginning", "middle", "end"],
        ...,
        "Section to query.",
    ]

class State(TypedDict):
    question: str
    query: Search
    context: List[Document]
    answer: str

这里定义了查询和状态的类型：

Search 用于描述查询，包括查询字符串（query）和要查询的段落（section）。
State 描述了整个应用的状态，包括用户的问题、查询、上下文文档以及生成的答案。

查询分析函数

def analyze_query(state: State):
    structured_llm = llm.with_structured_output(Search)
    query = structured_llm.invoke(state["question"])
    return {"query": query}

analyze_query 函数通过 LLM 分析用户的问题，并结构化输出一个 Search 对象，即包括查询字符串和段落信息。

检索函数

def retrieve(state: State):
    query = state["query"]
    retrieved_docs = vector_store.similarity_search(
        query["query"],
        filter=lambda doc: doc.metadata.get("section") == query["section"],
    )
    return {"context": retrieved_docs}

retrieve 函数从向量存储中检索与查询字符串相似的文档，并根据段落信息进行过滤。

生成答案

def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}

generate 函数将检索到的文档内容结合用户的问题，利用 LLM 生成答案。

构建状态图和执行步骤

graph_builder = StateGraph(State).add_sequence([analyze_query, retrieve, generate])
graph_builder.add_edge(START, "analyze_query")
graph = graph_builder.compile()

for step in graph.stream(
    {"question": "What does the end of the post say about Task Decomposition?"},
    stream_mode="updates",
):
    print(f"{step}\n\n----------------\n")

最后，这部分代码创建了一个状态图，将 analyze_query、retrieve 和 generate 函数按顺序链接起来，然后执行整个查询分析和生成过程。

LLM消息抓取

以上整个过程中，我们都是在调用LangChain API与LLM在进行交互，至于底层发送的请求细节，一无所知。在某些场景下面，我们还是需要去探究一下这些具体的细节，这样可以有一个全面的了解。下面我们看一下具体的发送内容，以上代码涉及到两个LLM交互。

交互1

LLM-Call1

LLM提问

{
  "messages": [
    [
      {
        "lc": 1,
        "type": "constructor",
        "id": [
          "langchain",
          "schema",
          "messages",
          "HumanMessage"
        ],
        "kwargs": {
          "content": "What does the end of the post say about Task Decomposition?",
          "type": "human"
        }
      }
    ]
  ]
}

LLM回答

[
  {
    "name": "Search",
    "args": {
      "query": "Task Decomposition",
      "section": "end"
    },
    "id": "call_5KSfkBGRge6G95IKL84n0mck",
    "type": "tool_call"
  }
]

交互2

LLM-Call2

LLM提问

{
  "messages": [
    [
      {
        "lc": 1,
        "type": "constructor",
        "id": [
          "langchain",
          "schema",
          "messages",
          "HumanMessage"
        ],
        "kwargs": {
          "content": "You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: What does the end of the post say about Task Decomposition? \nContext: You will get instructions for code to write.\nYou will write a very long answer. Make sure that every detail of the architecture is, in the end, implemented as code.\nMake sure that every detail of the architecture is, in the end, implemented as code.\nThink step by step and reason yourself to the right decisions to make sure we get it right.\nYou will first lay out the names of the core classes, functions, methods that will be necessary, as well as a quick comment on their purpose.\nThen you will output the content of each file including ALL code.\nEach file must strictly follow a markdown code block format, where the following tokens must be replaced such that\nFILENAME is the lowercase file name including the file extension,\nLANG is the markup code block language for the code’s language, and CODE is the code:\nFILENAME\nCODE\nYou will start with the “entrypoint” file, then go to the ones that are imported by that file, and so on.\n\n\"content\": \"Please now remember the steps:\\n\\nThink step by step and reason yourself to the right decisions to make sure we get it right.\\nFirst lay out the names of the core classes, functions, methods that will be necessary, As well as a quick comment on their purpose.\\n\\nThen you will output the content of each file including ALL code.\\nEach file must strictly follow a markdown code block format, where the following tokens must be replaced such that\\nFILENAME is the lowercase file name including the file extension,\\nLANG is the markup code block language for the code's language, and CODE is the code:\\n\\nFILENAME\\n```LANG\\nCODE\\n```\\n\\nPlease note that the code should be fully functional. No placeholders.\\n\\nYou will start with the \\\"entrypoint\\\" file, then go to the ones that are imported by that file, and so on.\\nFollow a language and framework appropriate best practice file naming convention.\\nMake sure that files contain all imports, types etc. The code should be fully\n\n\"content\": \"You will get instructions for code to write.\\nYou will write a very long answer. Make sure that every detail of the architecture is, in the end, implemented as code.\\nMake sure that every detail of the architecture is, in the end, implemented as code.\\n\\nThink step by step and reason yourself to the right decisions to make sure we get it right.\\nYou will first lay out the names of the core classes, functions, methods that will be necessary, as well as a quick comment on their purpose.\\n\\nThen you will output the content of each file including ALL code.\\nEach file must strictly follow a markdown code block format, where the following tokens must be replaced such that\\nFILENAME is the lowercase file name including the file extension,\\nLANG is the markup code block language for the code's language, and CODE is the code:\\n\\nFILENAME\\n```LANG\\nCODE\\n```\\n\\nYou will start with the \\\"entrypoint\\\" file, then go to the ones that are imported by that file, and so on.\\nPlease\n\nFILENAME\nCODE\nYou will start with the “entrypoint” file, then go to the ones that are imported by that file, and so on.\nPlease note that the code should be fully functional. No placeholders.\nFollow a language and framework appropriate best practice file naming convention.\nMake sure that files contain all imports, types etc. Make sure that code in different files are compatible with each other.\nEnsure to implement all code, if you are unsure, write a plausible implementation.\nInclude module dependency or package manager dependency definition file.\nBefore you finish, double check that all parts of the architecture is present in the files.\nUseful to know:\nYou almost always put different classes in different files.\nFor Python, you always create an appropriate requirements.txt file.\nFor NodeJS, you always create an appropriate package.json file.\nYou always add a comment briefly describing the purpose of the function definition.\nYou try to add comments explaining very complex bits of logic. \nAnswer:",
          "type": "human"
        }
      }
    ]
  ]
}

LLM回答

{
  "generations": [
    [
      {
        "text": "The end of the post emphasizes the importance of task decomposition in programming by outlining a structured approach to coding. It suggests that one should think step by step, starting with the identification of core classes and functions before implementing the code. Additionally, it highlights the necessity of ensuring that all parts of the architecture are present and that the code is fully functional with no placeholders.",
        "generation_info": {
          "finish_reason": "stop",
          "logprobs": null
        },
        "type": "ChatGeneration",
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain",
            "schema",
            "messages",
            "AIMessage"
          ],
          "kwargs": {
            "content": "The end of the post emphasizes the importance of task decomposition in programming by outlining a structured approach to coding. It suggests that one should think step by step, starting with the identification of core classes and functions before implementing the code. Additionally, it highlights the necessity of ensuring that all parts of the architecture are present and that the code is fully functional with no placeholders.",
            "additional_kwargs": {
              "refusal": null
            },
            "response_metadata": {
              "token_usage": {
                "completion_tokens": 72,
                "prompt_tokens": 903,
                "total_tokens": 975,
                "completion_tokens_details": {
                  "accepted_prediction_tokens": 0,
                  "audio_tokens": 0,
                  "reasoning_tokens": 0,
                  "rejected_prediction_tokens": 0
                },
                "prompt_tokens_details": {
                  "audio_tokens": 0,
                  "cached_tokens": 0
                }
              },
              "model_name": "gpt-4o-mini-2024-07-18",
              "system_fingerprint": "fp_3de1288069",
              "finish_reason": "stop",
              "logprobs": null
            },
            "type": "ai",
            "id": "run-7594f119-4edc-460c-870a-c0882946256a-0",
            "usage_metadata": {
              "input_tokens": 903,
              "output_tokens": 72,
              "total_tokens": 975,
              "input_token_details": {
                "audio": 0,
                "cache_read": 0
              },
              "output_token_details": {
                "audio": 0,
                "reasoning": 0
              }
            },
            "tool_calls": [],
            "invalid_tool_calls": []
          }
        }
      }
    ]
  ],
  "llm_output": {
    "token_usage": {
      "completion_tokens": 72,
      "prompt_tokens": 903,
      "total_tokens": 975,
      "completion_tokens_details": {
        "accepted_prediction_tokens": 0,
        "audio_tokens": 0,
        "reasoning_tokens": 0,
        "rejected_prediction_tokens": 0
      },
      "prompt_tokens_details": {
        "audio_tokens": 0,
        "cached_tokens": 0
      }
    },
    "model_name": "gpt-4o-mini-2024-07-18",
    "system_fingerprint": "fp_3de1288069"
  },
  "run": null,
  "type": "LLMResult"
}

总结

ai-quotes-elon-musk

本文讲述了如何实现了一个基于 RAG 的查询分析流程，主要包括三部分：

查询分析：理解用户的问题，并将其结构化。
检索：从向量存储中检索相似的文档，并根据段落信息进行过滤。
生成：通过 LLM 生成答案。

通过这种方式，可以高效地理解用户查询，检索相关信息，并给出精确的回答。希望这些注释和详解能帮助你更好地理解这段代码及其背后的原理。

标签：RAG,code,AI,查询,tokens,will,file,query
From： https://blog.csdn.net/2404_88048702/article/details/144177332

AI - 谈谈RAG中的查询分析