（26-4-01）基于OpenAI和LangChain的上市公司估值系统：OpenAI API和Langchain探索(1)OpenAI接口

时间：2024-06-15 21:58:01浏览次数：19

标签：turbo 26 01 section tokens 3.5 OpenAI gpt model

10.5 OpenAI API和Langchain探索

接下来使用OpenAI API与Langchain对解析后的文档进行总结，从中提取有价值的信息。这将帮助我们更好地理解文档中的内容，包括业务情况、风险因素、财务状况分析等，并提供更简洁的概括信息。

10.5.1 OpenAI接口

编写文件openai_interface.py，实现了许多与OpenAI接口相关的功能，包括处理消息、计算令牌数量、调用OpenAI模型以及创建摘要。它还包括了一些用于检查输入令牌数量的功能。通过使用这些功能，我们可以提取并总结文档中的关键部分，以便更好地理解和分析文档。

parser = ConfigParser()
_ = parser.read(os.path.join("credentials.cfg"))
openai.api_key = parser.get("open_ai", "api_key")

INITIAL_CONTEXT_MESSAGE = {"role": "system",
                           "content": "Act as an assistant for security analysis. Your goal is to help make sense of "
                                      "financial information available for US public companies on EDGAR."}
MODEL_MAX_TOKENS = {
    "gpt-3.5-turbo": 4097,
    "gpt-3.5-turbo-16k": 16384,
}


def get_completion(messages, model="gpt-3.5-turbo"):
    return openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0, # this is the degree of randomness of the model's output
    )


def num_tokens_from_messages(messages, model="gpt-3.5-turbo"):
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        encoding = tiktoken.get_encoding("cl100k_base")

    if model in ["gpt-3.5-turbo", "gpt-3.5-turbo-16k"]:
        num_tokens = 0
        for message in messages:
            num_tokens += 4  # every message follows <im_start>{role/name}\n{content}<im_end>\n
            for key, value in message.items():
                num_tokens += len(encoding.encode(value))
                if key == "name":  # if there's a name, the role is omitted
                    num_tokens += -1  # role is always required and always 1 token

        num_tokens += 2  # every reply is primed with <im_start>assistant
        return num_tokens


def compute_cost(tokens, model="gpt-3.5-turbo"):
    if model == "gpt-3.5-turbo":
        return round(tokens / 1000 * 0.002, 4)
    if model == "gpt-3.5-turbo-16k":
        return round(tokens / 1000 * 0.004, 4)

def get_text_tokens(value, model="gpt-3.5-turbo"):
    try:
        encoding = tiktoken.encoding_for_model(model)
    except KeyError:
        encoding = tiktoken.get_encoding("cl100k_base")
    return len(encoding.encode(value))

def get_messages(company_name, ticker, exchange, form, filing_date, section_title, section_text):
    prompt = f"I will give you some information about the company, the form I am analysing and " \
             f"then a text section of that form. All of this delimited by ^^^. " \
             f"Summarize the section keeping it as short as possible, without leaving out " \
             f"any information that could be relevant to an investor in the company. " \
             f"If there is any reference to debt issuance write the interest rate, if present." \
             f"Organize the output in a list of short information points (around 20 words each)." \
             f"Remove all the points that contain duplicate information." \
             f"Do not refer to exhibits." \
             f"Format the output as a json with a single key 'data' and value as a list of the information points." \
             f"^^^" \
             f"Company Name: {company_name}" \
             f"Ticker: {ticker}" \
             f"Exchange: {exchange}" \
             f"Form: {form}" \
             f"Filing date: {filing_date}" \
             f"Section title: {section_title}" \
             f"Section text: {section_text}" \
             f"^^^"

    messages = [
        INITIAL_CONTEXT_MESSAGE,
        {"role": "user", "content": prompt},
    ]
    return messages

def create_summary(section_text, model, chain_type="map_reduce", verbose=False):
    llm = ChatOpenAI(model_name=model, openai_api_key=parser.get("open_ai", "api_key"))
    string_loader = UnstructuredStringLoader(section_text)
    docs = split_doc_in_chunks(string_loader.load())
    chain = load_summarize_chain(llm, chain_type=chain_type, verbose=verbose)
    with get_openai_callback() as cb:
        res = chain.run(docs)
    return res, cb.total_tokens

def summarize_section(section_text, model="gpt-3.5-turbo", chain_type="map_reduce", verbose=False):
    summary, tokens = create_summary(section_text, model, chain_type, verbose)
    bullets = [x.strip() for x in re.split(r'(?<!inc)(?<!Inc)\. ', summary)]
    cost = compute_cost(tokens, model=model)
    return bullets, cost

def check_input_tokens(input_tokens, model):
    return input_tokens > MODEL_MAX_TOKENS[model] - 500

未完待续

标签：turbo,26,01,section,tokens,3.5,OpenAI,gpt,model
From： https://blog.csdn.net/asd343442/article/details/139690141

《并发编程系列01》从底层源码剖析AQS的来龙去脉！（通俗易懂）
前言本文是作者的第一篇文章，目的就是可以分享自己个人的一些技术上的心得体会以及找寻志同道合的人来共同讨论技术。个人学习难免会有一些理解上的错误，所以写博客也是为了记录和反思自己的学习过程，进一步加深对技术的理解和掌握。希望通过这篇博客，能够帮助到一些和我一样......
CC2500和CC1101移植说明
主要通过如何移植、移植注意、关于芯片配置、如何生成导出配置四大步骤来说明CC2500和CC1101移植首先通过下图1这个宏进行选择 &如何移植要移植的部分在CC2500_hal.c和CC2500_hal.h中, 搜索"//移植"就可以定位到库所需的依赖,需要根据您的环境实现这些函数&移植......
⭐宁波ISO14001认证：⭐开启绿色发展⭐的环保密钥⭐
......
Day21 | 530.二叉搜索树的最小绝对差、501.二叉搜索树中的众数、236. 二叉树的最近公
530.二叉搜索树的最小绝对差需要领悟一下二叉树遍历上双指针操作，优先掌握递归题目链接/文章讲解：https://programmercarl.com/0530.二叉搜索树的最小绝对差.html视频讲解：https://www.bilibili.com/video/BV1DD4y11779思考中序遍历的同时，用pre记录一下上一个节点。classSolut......
6 MM模块-公司间STO(Inter-Company Stock Transfer Order)条件类型P101取供货工厂的成
用户需求：请教下如果希望直接取供应工厂的成本价作为STO的定价应该怎么设置？需求分析：公司间STO的条件类型ConditonType需要取STO供应工厂(SupplyingPlant)对应的成本价(财务发布的标准价RunCostEstimate). 情形1：如果物料是外购件(采购类型-F)，财务CK11N或CK40NRunCost......
AI预测福彩3D采取888=3策略+和值012路或胆码测试6月15日新模型预测第5弹
今天咱们继续验证新模型的8码定位=3，目前新模型新算法8码定位经过4次测试，已命中3次，9码定位连续命中4次。咱们重点是预测8码定位=3＋和值012+胆码。有些朋友看到我最近几篇文章没有给大家提供缩水后的预测详情，在这里解释下：其实我每篇文章中既有8码定位，也有和值012......
AI大佬吴恩达+OpenAI团队编写：面向大模型入门者的 LLM CookBook 汉化版
粉丝们久等了！！！我又来更LLM大模型的必备读物啦！这次给大家推荐的是AI圈无人不知的吴恩达大佬+OpenAI团队一起编写的大模型入门文档，也就是这本：大型语言模型（LLM）的权威文档<面向开发者的LLM入门PDF>在Github上已经高达56.8kstar了，这含金量啧啧啧朋友们如果有需要这份《LLMC......
AI大佬吴恩达+OpenAI团队编写：面向大模型入门者的 LLM CookBook 汉化版
粉丝们久等了！！！我又来更LLM大模型的必备读物啦！这次给大家推荐的是AI圈无人不知的吴恩达大佬+OpenAI团队一起编写的大模型入门文档，也就是这本：大型语言模型（LLM）的权威文档<面向开发者的LLM入门PDF>在Github上已经高达56.8kstar了，这含金量啧啧啧朋友们如果有需要这份《LLMC......
326Springboot西安旅游系统旅游攻略旅游景点（源码+文档+运行视频+讲解视频）
项目技术：springboot+Maven+Vue等等组成，B/S模式+Maven管理等等。环境需要1.运行环境：最好是javajdk1.8，我们在这个平台上运行的。其他版本理论上也可以。2.IDE环境：IDEA，Eclipse,Myeclipse都可以。推荐IDEA;3.tomcat环境：Tomcat7.x,8.x,9.x版本均可4.硬件环境：windows......
计算机毕业设计项目推荐，32650在线教培管理系统的设计与实现（开题答辩+程序定制+全套文
目录摘要Abstract1绪论1.1研究意义1.2开发现状1.3论文结构与章节安排2 在线教培管理系统系统分析2.1可行性分析2.2系统流程分析2.2.1数据增加流程2.2.2数据修改流程2.2.3数据删除流程2.3 系统功能分析2.3.1功能性分析2.3.2非功......

（26-4-01）基于OpenAI和LangChain的上市公司估值系统：OpenAI API和Langchain探索(1)OpenAI接口

10.5 OpenAI API和Langchain探索

10.5.1 OpenAI接口

未完待续

相关文章

赞助商

阅读排行