10. RAG多维查询

标签：10 RAG models docs question sc source 多维 id

在RAG的pipeline如何使用多维查询。

!pip install -qU \
  pinecone-client==3.1.0 \
  langchain==0.1.1 \
  langchain-community==0.0.13 \
  datasets==2.14.6 \
  openai==1.6.1 \
  tiktoken==0.5.2

获取数据

我们将从Hugging Face数据集中下载一个已存的数据集。

from datasets import load_dataset

data = load_dataset("jamescalam/ai-arxiv-chunked", split="train")
data

Dataset({
    features: ['doi', 'chunk-id', 'chunk', 'id', 'title', 'summary', 'source', 'authors', 'categories', 'comment', 'journal_ref', 'primary_category', 'published', 'updated', 'references'],
    num_rows: 41584
})

from langchain.docstore.document import Document

docs = []

for row in data:
    doc = Document(
        page_content=row["chunk"],
        metadata={
            "title": row["title"],
            "source": row["source"],
            "id": row["id"],
            "chunk-id": row["chunk-id"],
            "text": row["chunk"]
        }
    )
    docs.append(doc)

Embedding和向量数据库设置

初始化ebedding模型：

import os
from getpass import getpass
from langchain.embeddings.openai import OpenAIEmbeddings

model_name = "text-embedding-ada-002"

# get openai api key from platform.openai.com
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or getpass("OpenAI API Key: ")

embed = OpenAIEmbeddings(
    model=model_name, openai_api_key=OPENAI_API_KEY, disallowed_special=()
)

现在我们创建我们的向量DB来存储我们的向量。为此，我们需要获得一个免费的Pinecone的API密钥--可以在Pinecone仪表盘左侧导航栏中的“API Kyes”按钮找到。

from pinecone import Pinecone

# initialize connection to pinecone (get API key at app.pinecone.io)
api_key = os.getenv("PINECONE_API_KEY") or getpass("Enter your Pinecone API key: ")

# configure client
pc = Pinecone(api_key=api_key)

现在我们设置我们的索引规格，这个允许我们定义要部署索引的云服务提供商和region。在这里可以找到所有的可用的云服务提供商及region。

from pinecone import ServerlessSpec

spec = ServerlessSpec(
    cloud="aws", region="us-east-1"
)

创建一个索引，我们设置dimension等于Ada-002的维度（1536），并且使用一个与Ada-002匹配的metric（可以是cosine或者dotproduct）。我们也可以传递我们的spec给索引初始化。

import time

index_name = "langchain-multi-query-demo"
existing_indexes = [
    index_info["name"] for index_info in pc.list_indexes()
]

# check if index already exists (it shouldn't if this is first time)
if index_name not in existing_indexes:
    # if does not exist, create index
    pc.create_index(
        index_name,
        dimension=1536,  # dimensionality of ada 002
        metric='dotproduct',
        spec=spec
    )
    # wait for index to be initialized
    while not pc.describe_index(index_name).status['ready']:
        time.sleep(1)

# connect to index
index = pc.Index(index_name)
time.sleep(1)
# view index stats
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}

生成我们的索引：

len(docs)

# if you want to speed things up to follow along
#docs = docs[:5000]

from tqdm.auto import tqdm

batch_size = 100

for i in tqdm(range(0, len(docs), batch_size)):
    i_end = min(len(docs), i+batch_size)
    docs_batch = docs[i:i_end]
    # get IDs
    ids = [f"{doc.metadata['id']}-{doc.metadata['chunk-id']}" for doc in docs_batch]
    # get text and embed
    texts = [d.page_content for d in docs_batch]
    embeds = embed.embed_documents(texts=texts)
    # get metadata
    metadata = [d.metadata for d in docs_batch]
    to_upsert = zip(ids, embeds, metadata)
    index.upsert(vectors=to_upsert)

0%| | 0/416 [00:00<?, ?it/s]

使用LangChain的多维查询

现在我们切换到LangChain中使用我们已经填充的索引作为向量存储。

from langchain.vectorstores import Pinecone

text_field = "text"

vectorstore = Pinecone(index, embed.embed_query, text_field)

/Users/jamesbriggs/opt/anaconda3/envs/ml/lib/python3.9/site-packages/langchain_community/vectorstores/pinecone.py:74: UserWarning: Passing in `embedding` as a Callable is deprecated. Please pass in an Embeddings object instead.
  warnings.warn(

from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)

我们初始化MultiQueryRetriever：

from langchain.retrievers.multi_query import MultiQueryRetriever

retriever = MultiQueryRetriever.from_llm(
    retriever=vectorstore.as_retriever(), llm=llm
)

我们设置logging，可以查看LLM生成的查询。

# Set logging for the queries
import logging

logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

要使用多查询检索器，我们调用get_relevant_documents方法。

question = "tell me about llama 2?"

docs = retriever.get_relevant_documents(query=question)
len(docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. Can you provide information about llama 2 and its characteristics?', '2. What can you tell me about llama 2 and its features?', '3. Could you give me an overview of llama 2 and its properties?']

由此，我们获得了每个查询独立的各种文档。默认情况下，retriever会为每个查询返回3个文档--一共9个文档--然而，因为存在一些重叠，我们最终得到了6个独特的文档。

docs

[Document(page_content='Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang\nRoss Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang\nAngela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic\nSergey Edunov Thomas Scialom\x03\nGenAI, Meta\nAbstract\nIn this work, we develop and release Llama 2, a collection of pretrained and ﬁne-tuned\nlarge language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.\nOur ﬁne-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , are optimized for dialogue use cases. Our\nmodels outperform open-source chat models on most benchmarks we tested, and based on\nourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosedsource models. We provide a detailed description of our approach to ﬁne-tuning and safety', metadata={'chunk-id': '1', 'id': '2307.09288', 'source': 'http://arxiv.org/pdf/2307.09288', 'title': 'Llama 2: Open Foundation and Fine-Tuned Chat Models'}),
 Document(page_content='asChatGPT,BARD,andClaude. TheseclosedproductLLMsareheavilyﬁne-tunedtoalignwithhuman\npreferences, which greatly enhances their usability and safety. This step can require signiﬁcant costs in\ncomputeandhumanannotation,andisoftennottransparentoreasilyreproducible,limitingprogresswithin\nthe community to advance AI alignment research.\nIn this work, we develop and release Llama 2, a family of pretrained and ﬁne-tuned LLMs, L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle and\nL/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , at scales up to 70B parameters. On the series of helpfulness and safety benchmarks we tested,\nL/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc models generally perform better than existing open-source models. They also appear to\nbe on par with some of the closed-source models, at least on the human evaluations we performed (see', metadata={'chunk-id': '9', 'id': '2307.09288', 'source': 'http://arxiv.org/pdf/2307.09288', 'title': 'Llama 2: Open Foundation and Fine-Tuned Chat Models'}),
 Document(page_content='Q:Yes or no: Could a llama birth twice during War in Vietnam (1945-46)?\nA:TheWar inVietnam was6months. Thegestationperiod forallama is11months, which ismore than 6\nmonths. Thus, allama could notgive birth twice duringtheWar inVietnam. So the answer is no.\nQ:Yes or no: Would a pear sink in water?\nA:Thedensityofapear isabout 0:6g=cm3,which islessthan water.Objects lessdense than waterﬂoat. Thus,\napear would ﬂoat. So the answer is no.\nTable 26: Few-shot exemplars for full chain of thought prompt for Date Understanding.\nPROMPT FOR DATE UNDERSTANDING\nQ:2015 is coming in 36 hours. What is the date one week from today in MM/DD/YYYY?\nA:If2015 iscomingin36hours, then itiscomingin2days. 2days before01/01/2015 is12/30/2014, sotoday\nis12/30/2014. Sooneweek from todaywillbe01/05/2015. So the answer is 01/05/2015.', metadata={'chunk-id': '137', 'id': '2201.11903', 'source': 'http://arxiv.org/pdf/2201.11903', 'title': 'Chain-of-Thought Prompting Elicits Reasoning in Large Language Models'}),
 Document(page_content='Dubois, Xuechen Li, Carlos Guestrin, Percy Liang,\nand Tatsunori B. Hashimoto. 2023. Stanford alpaca:\nAn instruction-following llama model. https://\ngithub.com/tatsu-lab/stanford_alpaca .\nRomal Thoppilan, Daniel De Freitas, Jamie Hall, Noam\nShazeer, Apoorv Kulshreshtha, Heng-Tze Cheng,\nAlicia Jin, Taylor Bos, Leslie Baker, Yu Du, et al.\n2022. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239 .\nHugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier\nMartinet, Marie-Anne Lachaux, Timothée Lacroix,\nBaptiste Rozière, Naman Goyal, Eric Hambro, Faisal\nAzhar, Aurelien Rodriguez, Armand Joulin, Edouard\nGrave, and Guillaume Lample. 2023. Llama: Open\nand efficient foundation language models. arXiv\npreprint arXiv:2302.13971 .', metadata={'chunk-id': '37', 'id': '2304.01196', 'source': 'http://arxiv.org/pdf/2304.01196', 'title': 'Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data'}),
 Document(page_content="user asks weather, which needs more specific and updated information. I should\nfollow rules 3 (helpful), 7 (candor), 9 (static).\nWatson: As an AI language model, I have no access to weather sensors or real-time\nweather data. However, I can recommend you to:\n1. check your local weather forecast on a weather website or app,\n2. or by watching the news or checking your local weather station.\nUser: Good job! Clear context\nWatson (auto reply): Thank you! For further questions or guidance, just reach out.\nUser: Tell me about alpaca 's family\nWatson (internal thoughts): I am a helpful, ethical, and reliable assistant. The\nuser asks alpaca, which is a public entity in one of my internal knowledge\nbases: Wikipedia. I should follow rules 2 (informative), 6 (multi-aspect), 8 (\nknowledge recitation).\nWatson: According to my knowledge from Wikipedia, here is what I know about alpaca 's\nfamily:\nThe alpaca (Lama pacos) is a species of South American camelid mammal. It is similar\nto, and often confused with, the llama. However, alpacas are often noticeably\nsmaller than llamas.\nRegarding their family, alpacas belong to the Camelidae family, which includes other", metadata={'chunk-id': '112', 'id': '2305.03047', 'source': 'http://arxiv.org/pdf/2305.03047', 'title': 'Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision'}),
 Document(page_content='models will be released as we improve model safety with community feedback.\nLicense A custom commercial license is available at: ai.meta.com/resources/\nmodels-and-libraries/llama-downloads/\nWhere to send commentsInstructions on how to provide feedback or comments on the model can be\nfound in the model README, or by opening an issue in the GitHub repository\n(https://github.com/facebookresearch/llama/ ).\nIntended Use\nIntended Use Cases L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle is intended for commercial and research use in English. Tuned models\nare intended for assistant-like chat, whereas pretrained models can be adapted\nfor a variety of natural language generation tasks.\nOut-of-Scope Uses Use in any manner that violates applicable laws or regulations (including trade\ncompliancelaws). UseinlanguagesotherthanEnglish. Useinanyotherway\nthat is prohibited by the Acceptable Use Policy and Licensing Agreement for\nL/l.sc/a.sc/m.sc/a.sc /two.taboldstyle.\nHardware and Software (Section 2.2)\nTraining Factors We usedcustomtraininglibraries, Meta’sResearchSuperCluster, andproductionclustersforpretraining. Fine-tuning,annotation,andevaluationwerealso', metadata={'chunk-id': '317', 'id': '2307.09288', 'source': 'http://arxiv.org/pdf/2307.09288', 'title': 'Llama 2: Open Foundation and Fine-Tuned Chat Models'})]

在RAG中添加“生成”

截止到目前为止，我们构建了一个又多查询驱动的检索增强chain。现在我们需要添加“Generation”生成。

from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

QA_PROMPT = PromptTemplate(
    input_variables=["query", "contexts"],
    template="""You are a helpful assistant who answers user queries using the
    contexts provided. If the question cannot be answered using the information
    provided say "I don't know".

    Contexts:
    {contexts}

    Question: {query}""",
)

# Chain
qa_chain = LLMChain(llm=llm, prompt=QA_PROMPT)

out = qa_chain(
    inputs={
        "query": question,
        "contexts": "\n---\n".join([d.page_content for d in docs])
    }
)
out["text"]

'Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. The fine-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc, are optimized for dialogue use cases. These models outperform open-source chat models on most benchmarks and are considered a suitable substitute for closed-source models based on humane evaluations for helpfulness and safety. The approach to fine-tuning and safety is described in detail.'

使用SequentialChain串联所有事情

我们可以将上述逻辑整合成一个函数或者一组方法，无论偏向哪种方法都可以---然而如果我们打算使用LangChain的方式来达到目标，我们需要串联多个chain。第一个检索组件是：（1）本身不是一个链，以及（2）需要处理输出。为了做到这一点，并且和LangChain的“chaining chain”匹配，我们需要将检索组件设置到TransformChain中：

from langchain.chains import TransformChain

def retrieval_transform(inputs: dict) -> dict:
    docs = retriever.get_relevant_documents(query=inputs["question"])
    docs = [d.page_content for d in docs]
    docs_dict = {
        "query": inputs["question"],
        "contexts": "\n---\n".join(docs)
    }
    return docs_dict

retrieval_chain = TransformChain(
    input_variables=["question"],
    output_variables=["query", "contexts"],
    transform=retrieval_transform
)

现在，我们将使用SequentialChain这个与我们的生成步骤关联：

from langchain.chains import SequentialChain

rag_chain = SequentialChain(
    chains=[retrieval_chain, qa_chain],
    input_variables=["question"],  # we need to name differently to output "query"
    output_variables=["query", "contexts", "text"]
)

接着，我们可以执行整个RAG流水线：

out = rag_chain({"question": question})
out["text"]

INFO:langchain.retrievers.multi_query:Generated queries: ['1. What information can you provide about llama 2?', '2. Could you give me some details about llama 2?', '3. I would like to learn more about llama 2. Can you help me with that?']

'Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. These LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc, are optimized for dialogue use cases. They have been shown to outperform open-source chat models on most benchmarks and are considered a suitable substitute for closed-source models based on humane evaluations for helpfulness and safety. The approach to fine-tuning and safety is described in detail in the work.'

自定义多查询

我们将用两个prompt来尝试这个，这两个prompt都会促使搜索查询更加多样化。

promptA

Your task is to generate 3 different search queries that aim to
answer the user question from multiple perspectives.
Each query MUST tackle the question from a different viewpoint,
we want to get a variety of RELEVANT search results.
Provide these alternative questions separated by newlines.
Original question: {question}

promptB

Your task is to generate 3 different search queries that aim to
answer the user question from multiple perspectives. The user questions
are focused on Large Language Models, Machine Learning, and related
disciplines.
Each query MUST tackle the question from a different viewpoint, we
want to get a variety of RELEVANT search results.
Provide these alternative questions separated by newlines.
Original question: {question}

from typing import List
from langchain.chains import LLMChain
from pydantic import BaseModel, Field
from langchain.prompts import PromptTemplate
from langchain.output_parsers import PydanticOutputParser


# Output parser will split the LLM result into a list of queries
class LineList(BaseModel):
    # "lines" is the key (attribute name) of the parsed output
    lines: List[str] = Field(description="Lines of text")


class LineListOutputParser(PydanticOutputParser):
    def __init__(self) -> None:
        super().__init__(pydantic_object=LineList)

    def parse(self, text: str) -> LineList:
        lines = text.strip().split("\n")
        return LineList(lines=lines)


output_parser = LineListOutputParser()

template = """
Your task is to generate 3 different search queries that aim to
answer the user question from multiple perspectives. The user questions
are focused on Large Language Models, Machine Learning, and related
disciplines.
Each query MUST tackle the question from a different viewpoint, we
want to get a variety of RELEVANT search results.
Provide these alternative questions separated by newlines.
Original question: {question}
"""

QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template=template,
)
llm = ChatOpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)

# Chain
llm_chain = LLMChain(llm=llm, prompt=QUERY_PROMPT, output_parser=output_parser)

# Run
retriever = MultiQueryRetriever(
    retriever=vectorstore.as_retriever(), llm_chain=llm_chain, parser_key="lines"
)  # "lines" is the key (attribute name) of the parsed output

# Results
docs = retriever.get_relevant_documents(
    query=question
)
len(docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. What are the key features and capabilities of Large Language Model Llama 2?', '2. How does Llama 2 compare to other Large Language Models in terms of performance and efficiency?', '3. What are the applications and use cases of Llama 2 in the field of Machine Learning and Natural Language Processing?']

docs

[Document(page_content='Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang\nRoss Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang\nAngela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic\nSergey Edunov Thomas Scialom\x03\nGenAI, Meta\nAbstract\nIn this work, we develop and release Llama 2, a collection of pretrained and ﬁne-tuned\nlarge language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.\nOur ﬁne-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , are optimized for dialogue use cases. Our\nmodels outperform open-source chat models on most benchmarks we tested, and based on\nourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosedsource models. We provide a detailed description of our approach to ﬁne-tuning and safety', metadata={'chunk-id': '1', 'id': '2307.09288', 'source': 'http://arxiv.org/pdf/2307.09288', 'title': 'Llama 2: Open Foundation and Fine-Tuned Chat Models'}),
 Document(page_content='2\n3.4.3 Even programmatic measures of model capability can be highly subjective . . . . . . . 15\n3.5 Even large language models are brittle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15\n3.6 Social bias in large language models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17\n3.7 Performance on non-English languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20\n4 Behavior on selected tasks 21\n4.1 Checkmate-in-one task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22\n4.2 Periodic elements task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23\n5 Additional related work 24\n6 Discussion 25', metadata={'chunk-id': '14', 'id': '2206.04615', 'source': 'http://arxiv.org/pdf/2206.04615', 'title': 'Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models'}),
 Document(page_content='challenges described above) about how the development of large language models has unfolded thus far, including a\nquantitative analysis of the increasing gap between academia and industry for large model development.\nFinally, in Section 4 we outline policy interventions that may help concretely address the challenges we outline in\nSections 2 and 3 in order to help guide the development and deployment of larger models for the broader social good.\nWe leave some illustrative experiments, technical details, and caveats about our claims in Appendix A.\n2 DISTINGUISHING FEATURES OF LARGE GENERATIVE MODELS\nWe claim that large generative models (e.g., GPT-3 [ 11], LaMDA [ 78], Gopher [ 62], etc.) are distinguished by four\nfeatures:\n•Smooth, general capability scaling : It is possible to predictably improve the general performance of generative\nmodels — their loss on capturing a specific, though very broad, data distribution — by scaling up the size of the\nmodels, the compute used to train them, and the amount of data they’re trained on in the correct proportions.\nThese proportions can be accurately predicted by scaling laws (Figure 1). We believe that these scaling laws\nde-risk investments in building larger and generally more capable models despite the high resource costs and the\ndifficulty of predicting precisely how well a model will perform on a specific task. Note, the harmful properties', metadata={'chunk-id': '9', 'id': '2202.07785', 'source': 'http://arxiv.org/pdf/2202.07785', 'title': 'Predictability and Surprise in Large Generative Models'}),
 Document(page_content='asChatGPT,BARD,andClaude. TheseclosedproductLLMsareheavilyﬁne-tunedtoalignwithhuman\npreferences, which greatly enhances their usability and safety. This step can require signiﬁcant costs in\ncomputeandhumanannotation,andisoftennottransparentoreasilyreproducible,limitingprogresswithin\nthe community to advance AI alignment research.\nIn this work, we develop and release Llama 2, a family of pretrained and ﬁne-tuned LLMs, L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle and\nL/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , at scales up to 70B parameters. On the series of helpfulness and safety benchmarks we tested,\nL/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc models generally perform better than existing open-source models. They also appear to\nbe on par with some of the closed-source models, at least on the human evaluations we performed (see', metadata={'chunk-id': '9', 'id': '2307.09288', 'source': 'http://arxiv.org/pdf/2307.09288', 'title': 'Llama 2: Open Foundation and Fine-Tuned Chat Models'}),
 Document(page_content='but BoolQ. Similarly, this model surpasses PaLM540B everywhere but on BoolQ and WinoGrande.\nLLaMA-13B model also outperforms GPT-3 on\nmost benchmarks despite being 10 \x02smaller.\n3.2 Closed-book Question Answering\nWe compare LLaMA to existing large language\nmodels on two closed-book question answering\nbenchmarks: Natural Questions (Kwiatkowski\net al., 2019) and TriviaQA (Joshi et al., 2017). For\nboth benchmarks, we report exact match performance in a closed book setting, i.e., where the models do not have access to documents that contain\nevidence to answer the question. In Table 4, we\nreport performance on NaturalQuestions, and in Table 5, we report on TriviaQA. On both benchmarks,\nLLaMA-65B achieve state-of-the-arts performance\nin the zero-shot and few-shot settings. More importantly, the LLaMA-13B is also competitive on\nthese benchmarks with GPT-3 and Chinchilla, despite being 5-10 \x02smaller. This model runs on a\nsingle V100 GPU during inference.\n0-shot 1-shot 5-shot 64-shot\nGopher 280B 43.5 - 57.0 57.2', metadata={'chunk-id': '17', 'id': '2302.13971', 'source': 'http://arxiv.org/pdf/2302.13971', 'title': 'LLaMA: Open and Efficient Foundation Language Models'}),
 Document(page_content='5 Discussion 19\n6 Conclusion 21\n1 Introduction: motivation for the survey and deﬁnitions\n1.1 Motivation\nLarge Language Models (LLMs) ( Devlin et al. ,2019;Brown et al. ,2020;Chowdhery et al. ,2022) have fueled dramatic progress in Natural Language Processing (NLP ) and are already core in several products with\nmillions of users, such as the coding assistant Copilot ( Chen et al. ,2021), Google search engine1or more recently ChatGPT2. Memorization ( Tirumala et al. ,2022) combined with compositionality ( Zhou et al. ,2022)\ncapabilities made LLMs able to execute various tasks such as language understanding or conditional and unconditional text generation at an unprecedented level of pe rformance, thus opening a realistic path towards\nhigher-bandwidth human-computer interactions.\nHowever, LLMs suﬀer from important limitations hindering a broader deployment. LLMs often provide nonfactual but seemingly plausible predictions, often referr ed to as hallucinations ( Welleck et al. ,2020). This\nleads to many avoidable mistakes, for example in the context of arithmetics ( Qian et al. ,2022) or within\na reasoning chain ( Wei et al. ,2022c ). Moreover, many LLMs groundbreaking capabilities seem to emerge', metadata={'chunk-id': '5', 'id': '2302.07842', 'source': 'http://arxiv.org/pdf/2302.07842', 'title': 'Augmented Language Models: a Survey'}),
 Document(page_content='practicable options for academic research since they were acquired by Appen, a company that is\nfocused on a business market.\nThis paper explores the potential of large language models (LLMs) for text annotation tasks, with a\nfocus on ChatGPT, which was released in November 2022. It demonstrates that zero-shot ChatGPT\nclassiﬁcations (that is, without any additional training) outperform MTurk annotations, at a fraction\nof the cost. LLMs have been shown to perform very well for a wide range of purposes, including\nideological scaling (Wu et al., 2023), the classiﬁcation of legislative proposals (Nay, 2023), the\nresolution of cognitive psychology tasks (Binz and Schulz, 2023), and the simulation of human\nsamples for survey research (Argyle et al., 2023). While a few studies suggested that ChatGPT\nmight perform text annotation tasks of the kinds we have described (Kuzman, Mozeti ˇc and Ljubeši ´c,\n2023; Huang, Kwak and An, 2023), to the best of our knowledge our work is the ﬁrst systematic\nevaluation. Our analysis relies on a sample of 6,183 documents, including tweets and news articles', metadata={'chunk-id': '3', 'id': '2303.15056', 'source': 'http://arxiv.org/pdf/2303.15056', 'title': 'ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks'})]

将他整合到另外一个SequentialChain：

retrieval_chain = TransformChain(
    input_variables=["question"],
    output_variables=["query", "contexts"],
    transform=retrieval_transform
)

rag_chain = SequentialChain(
    chains=[retrieval_chain, qa_chain],
    input_variables=["question"],  # we need to name differently to output "query"
    output_variables=["query", "contexts", "text"]
)

在询问一次：

out = rag_chain({"question": question})
out["text"]

INFO:langchain.retrievers.multi_query:Generated queries: ['1. What are the key features and capabilities of Large Language Model Llama 2?', '2. How does Llama 2 compare to other Large Language Models in terms of performance and efficiency?', '3. What are the applications and use cases of Llama 2 in the field of Machine Learning and Natural Language Processing?']

'Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. These models, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc, are optimized for dialogue use cases and have been shown to outperform open-source chat models on most benchmarks. They are considered as a suitable substitute for closed-source models in terms of helpfulness and safety. The development of Llama 2 addresses challenges such as programmatic measures of model capability, brittleness of large language models, social bias, and performance on non-English languages.'

在完成之后，删除你的Pinecone索引来节省资源：

pc.delete_index(index_name)

标签：10,RAG,models,docs,question,sc,source,多维,id
From： https://blog.csdn.net/weixin_40307696/article/details/142481450