1. 回复中包含推理或解释
system prompt |
---|
1. You are an AI assistant that helps people find information. Provide a detailed answer so user don’t need to search outside to understand the answer. |
2. You are an AI assistant that helps people find information. User will you give you a question. Your task is to answer as faithfully as you can. While answering think step-bystep and justify your answer. |
2. 判断规范
Judging Guidelines
The core part of this annotation is to evaluate and compare the two candidate responses given the
conversation history and the user request. A typical annotation process includes the following:
1. You are supposed to understand the context and the user intent by reading the
conversations and thinking over the user request.
2. You need to read and compare the two model responses carefully and find their key
differences.
3. Frequently, you may need external tools to verify information in the responses if you lack
the necessary background or feel unsure about something.
4. Based on these, you will indicate your preference between two candidate responses based
on several aspects such as Helpfulness, Truthfulness, and Harmlessness. You need to
provide your overall assessment. In the annotation UI, these are in the form of radio
buttons.
5. You also need to specify your confidence level in the corresponding judgment.
6. You have the option to skip evaluating instances, but we encourage you to first attempt to
answer an instance to the best of your abilities.
MoDS: Model-oriented Data Selection for Instruction Tuning(paper、github、介绍)
MoDS方法主要通过质量、多样性、必要性三个指标来进行数据的筛选。整个过程分3个阶段:
- 质量筛选:收集混合开源数据集mixData,采用OpenAssistant的reward-model-debertav3-large-v2模型(一个基于DeBERTa架构设计的奖励模型)对数据进行质量打分。,当评分超过α时,则认为数据质量达标,构建一份高质量数据集-Data1。
- 多样性筛选:通过K-Center-Greedy算法进行数据筛选,在最大化多样性的情况下,使指令数据集最小。获取种子指令数据集(Seed Instruction Data)-SID。
- 必要性筛选:从混合数据集mixData中筛选出大模型推理结果不佳的指令。以下是筛选步骤:a. 使用SID数据集对模型进行一个初始训练;b. 用训练好的初始模型对整个高质数据集-Data1中的指令进行结果预测;c. 利用奖励模型对结果进行评分,当分值小于β时,说明初始模型在这些指令上需要提升,获取必要性数据集-Data2;d. 对Data2进行多样性筛选,获取增强指令数据集(Augmented Instruction Data)-AID。
最终,利用种子指令数据集和增强指令数据集(SID+AID)一起对模型进行指令微调,获得最终模型
标签:responses,模型,指令,need,相关,筛选,数据,your From: https://www.cnblogs.com/end/p/18340679