我已经构建了一个像这样的 RAG 系统:
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
response_schemas = [
ResponseSchema(name="price", description="Price", type="float"),
ResponseSchema(name="unit", description="Unit", type="int"),
]
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
rag_prompt = PromptTemplate(
input_variables=["context","question"],
template=template,
partial_variables={"format_instructions": output_parser.get_format_instructions()},
)
rag_chain = (
{"context": compression_retriever | format_docs, "question": RunnablePassthrough()}
| rag_prompt
| llm
| output_parser
)
query = "What is the price? How many units?"
response = rag_chain.invoke(query, config={"configurable": {"session_id": "abc123"}},)
但是我的响应是一个 JSON,其中我的价格和单位仅作为键。我希望能够有一个“上下文”变量来存储算法用来回答问题的文档中使用的段落。
知道我该怎么做吗?
要返回用于回答 Langchain 中的查询的上下文,需要修改的代码以包含源文档。以下是如何执行此操作的方法:
-
启用源信息跟踪: 在 Langchain 中,可以使用
include_source_documents
参数将源文档包含在响应中。 -
修改的
rag_chain
定义:
python
rag_chain = (
{"context": compression_retriever | format_docs, "question": RunnablePassthrough()}
| rag_prompt
| llm.bind(include_source_documents=True)
| output_parser
)
我们在这里对
llm
使用
.bind()
,以在调用时传递
include_source_documents=True
。
-
从响应中访问上下文:
现在,的响应将包含一个
source_documents
字段,其中包含用于生成答案的文档。
```python response = rag_chain.invoke(query, config={"configurable": {"session_id": "abc123"}})
print(response.json()) # 打印结构化输出 print(response.source_documents) # 打印源文档列表 ```
完整的更新代码:
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.prompts import PromptTemplate
from langchain.schema import (
HumanMessage,
SystemMessage,
AIMessage,
)
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
# ... [的文档加载和检索器设置] ...
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
response_schemas = [
ResponseSchema(name="price", description="Price", type="float"),
ResponseSchema(name="unit", description="Unit", type="int"),
]
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
rag_prompt = PromptTemplate(
input_variables=["context", "question"],
template=template, # 在这里定义的模板
partial_variables={"format_instructions": output_parser.get_format_instructions()},
)
rag_chain = (
{"context": compression_retriever | format_docs, "question": RunnablePassthrough()}
| rag_prompt
| llm.bind(include_source_documents=True) # 启用源文档包含
| output_parser
)
query = "What is the price? How many units?"
response = rag_chain.invoke(query, config={"configurable": {"session_id": "abc123"}})
print(response.json())
print(response.source_documents)
现在,可以访问
response.source_documents
以获取用于回答查询的上下文。