2024-11-30offline RL · PbRL | LiRE:构造 A>B>C 的 RLT 列表,得到更多 preference 数据论文标题:ListwiseRewardEstimationforOfflinePreference-basedReinforcementLearning,ICML2024。arxiv:https://arxiv.org/abs/2408.04190pdf:https://arxiv.org/pdf/2408.04190html:https://ar5iv.org/html/2408.04190GitHub:https://github.com/chwoong/LiRE(感觉关于