首页 > 其他分享 >Instruct-GPT


时间:2023-07-08 14:55:55浏览次数:38  
标签:示例 模型 instruct Instruct 人员 GPT InstructGPT 标注





  • 模型结果排序:一组提示,每个提示对应多个模型的输出。标注人员需要根据整体质量对模型的输出进行排序。然后与研究人员标注的排序结果进行对比。
  • 敏感言论: 创建一组提示及对应回答数据集(prompt, completion),其中一些提示或者回答是敏感的(能引起强烈负面感受,如毒害、黄色、暴力、评判、政治等)。InstructGPT团队自己也标注了这些数据,然后将这些候选标注人员标注的结果与之进行对比。
  • 自我评估辨别针对不同群体的敏感言论:希望雇佣能判别广泛敏感内容的标注人员。但是由于法律原因,又不能根据人口统计规则雇佣相关人员。因此要求候选标注人员,填写或回答如“对于哪些主题或者文化群体,您可以轻松识别敏感言论?”,然后将其作为筛选的一部分。




  通过向标注人员发送自愿匿名问卷调查,以便了解标注人员的人口统计信息。 说明InstructGPT很重视Bias,从数据集标注这块,减缓让数据标签Bias。


  • 指定prompt/instruct的模型输出打分 1-7分
  • 不同方面打标签: 是否正确执行指令;作为客户助手而言回答是否合适;是否包含色情内容;是否包含暴力内容;是否鼓励或者没有阻拦暴力、虐待、恐怖、自残;是否诋毁受包含类;是否给出了有害建议;是否进行道德评判;
  • 同一prompt/instruct的不同模型输出按质量好坏进行排序
图 1

instruct dataset说明: 形式上有三部分构成:(指令、输入、输出)或者(instruct, input, output)。

  • instruct: 请以下面几个词语为主题写一篇不少于800字的文章
  • input: 助人为乐、见义勇为
  • output: xxx


  • instruct: 请以助人为乐、见义勇为为主题写一篇不少于800字的文章
  • input: ""
  • output: xxx

instruct dataset 是如何获取到的?
instruct: 用户提交到API中的,标注人员编写,这些都是人工生成;还有就是也可以由模型生成如self-instruct中介绍的方法





InstructGPT 概括说明


  • Step1: 选择指令数据:其中instruct样本来自用户提交到API上的,以及标注人员人工编写的。对应的input-output则是标注人员人工编写的。然后基于此种数据集微调GPT-3。
  • Step2: 基于大量的API instructs, 对应每一条instruct都由不同模型生成多个不同的output,然后由人工进行标注排序。基于这种数据集,训练reward model。
  • Step3: 使用上述RM模式评估GPT-3,通过强化学习不断优化模型output 满足RM模型较高评价分数。
图 2


  • 简单的: 仅要求标注人员任意写instruct示例,但要求这些示例要足够多样
  • 一对多:要求标注人员写instruct示例,同时要求写出与该instruct示例想对应的多个(input, output)样本
  • 基于API的:基于用户提交到OpenAI API中的instruct示例,要求标注人员对应其中的每个instruct示例,都写出与之相似的或相同含义的示例。

图3 是对提交到API的instruct样本进行统计分类如下图所示。大多数的instruct样本是生产式的,而不是分类或者问答类问题。

图 3

表1 是来自InstructGPT分布的用户提交prompt示例。我对比看了来自GPT3分布的用户提交的prompt示例,但是感觉不出来区别。

表 1
Use Case Example
brainstorming List five ideas for how to regain enthusiasm for my career
brainstorming What are some key points I should know when studying Ancient Greece?
brainstorming What are 4 questions a user might have after reading the instruction manual for a trash compactor?

{user manual}

brainstorming What are 10 science fiction books I should read next?
classification Take the following text and rate, on a scale from 1-10, how sarcastic the person is being (1 = not at all, 10 = extremely sarcastic). Also give an explanation

classification This is a list of tweets and the sentiment categories they fall into.
Tweet: {tweet_content1}
Sentiment: {sentiment1}
Tweet: {tweet_content2}
classification {java code}
What language is the code above written in?
classification You are a very serious professor, and you check papers to see if they contain missing citations. Given the text, say whether it is missing an important citation (YES/NO) and which sentence(s) require citing.
extract Extract all course titles from the table below:
extract Extract all place names from the article below:
extract Given the following list of movie titles, write down any names of cities in the titles.
generation Write a creative ad for the following product to run on Facebook aimed at parents:
generation Write a short story where a brown bear to the beach, makes friends with a seal, and then return home.
generation Here’s a message to me:


Here are some bullet points for a reply:


Write a detailed reply
generation This is an article about how to write a cover letter when applying for jobs:

It’s important to spend some time
generation write rap lyrics on the topics mentioned in this news article:
rewrite This is the summary of a Broadway play:
This is the outline of the commercial for that play:
rewrite Translate this sentence to Spanish:
rewrite Create turn-by-turn navigation given this text: Go west on {road1} unto you hit {road2}.Desination will be a red barn on the right then take it east to {road3}.
rewrite Rewrite the following text to be more light-hearted:

{very formal text}
chat The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly.
Human: Hello, who are you?
AI: I am an AI created by OpenAI. How can I help you today?
Human: I’d like to cancel my subscription.
chat Marv is a chatbot that reluctantly answers questions with sarcastic responses:
You: How many pounds are in a kilogram?
Marv: This again? There are 2.2 pounds in a kilogram. Please make a note of this.
You: What does HTML stand for?
Marv: Was Google too busy? Hypertext Markup Language. The T is for try to ask better questions in the future.
You: When did the first airplane fly?
chat This is a conversation with an enlightened Buddha. Every response is full of wisdom and love.
Me: How can I achieve greater peace and equanimity?
closed qa Help me answer questions about the following short story:
What is the moral of the story?
closed qa Answer the following question:
What shape is the earth?
A) A circle
B) A sphere
C) An ellipse
D) A plane
closed qa Tell me how hydrogen and helium are different, using the following facts:
open qa I am a highly intelligent question answering bot. If you ask me a question that is rooted in truth, I will give you the answer. If you ask me a question that is nonsense, trickery, or has no clear answer, I will respond with "Unknown".
Q: What is human life expectancy in the United States?
A: Human life expectancy in the United States is 78 years.
Q: Who was president of the United States in 1955?
open qa Who built the statue of liberty?
open qa How do you take the derivative of the sin function?
open qa who are the indiginous people of New Zealand?
summarization Summarize this for a second-grade student:
summarization {news article}
summarization {chat transcript}
Summarize the above conversation between a customer and customer
assistant. Make sure to state any complaints that the customer has.
other start with where
other Look up "cowboy" on Google and give me the results.
other Johnathan Silver goes to the market every day, and brings back a

From: https://www.cnblogs.com/wolfling/p/17537207.html


  • RPA开发者要失业?用ChatGPT写RPA脚本?-RPA学习天地
     随着人工智能技术的不断发展,AI已经开始在多个领域发挥着越来越重要的作用。从文案生成到图片设计,从代码审核到代码生成,AI已经开始在各种任务中取代人类的工作。即使是技术壁垒比较高的程序员,也已经开始将代码审查和生成的任务交给AI来完成。 在RPA开发领域中,通过RPA拖拉......
  • 我用numpy实现了GPT-2,GPT-2源码,GPT-2模型加速推理,并且可以在树莓派上运行,读了不少hung
  • ChatGPT还是有点东西的-public static <T> List<T> Arrays.asList(T... a) {...}
  • chatgpt 与传统3D建模对比分析
    推荐:将NSDT场景编辑器加入你的3D工具链  随着人工智能技术的发展,越来越多的领域正逐渐被AI模型所取代。ChatGPT作为一种自然语言处理技术,越来越为人们所熟悉。最近,一些3D建模领域的专家想知道ChatGPT是否可以取代传统的手动3D建模。本文的目的是分析用ChatGPT取代传统手动3D建......
  • 给 Helm 修复一个 Bug - 每天5分钟玩转 GPT 编程系列(2)
  • 基于GPT,为外贸企业打造触手可及的团队私有知识库
  • 逼近GPT-4!BLOOMChat: 开源可商用支持多语言的大语言模型
  • wsl2 Ubuntu 安装mysql 与chatgpt3.5聊天记录
  • CHATGPT获取登录token
  • 文心一言 VS 讯飞星火 VS chatgpt (55)-- 算法导论6.3 1题