2024年真是生成式人工智能研究大放异彩的一年!最让我们惊讶的是,整个领域的焦点发生了翻天覆地的变化。尤其是在 2023 年和 2024 年,情况开始变得截然不同,由于大模型模型已经能够做很多事情,因此也更加关注应用层面的研究。
论文集合地址:https://github.com/aishwaryanr/awesome-generative-ai-guide
论文合集的分类框架如上图所示,把AI研究想象成一个从输入到输出的系统,就像实际部署的场景一样。这个框架分为几层,每层都有其独特的关注点:
输入层:
这是大模型应用的起点,聚焦于输入处理和提示工程的研究。通过巧妙调整输入数据的方式,我们可以让大型语言模型(LLM)输出更优质的结果。
数据/模型层:
这一层关注的是模型的“燃料”和“引擎”。研究内容包括提升数据质量、生成合成数据,确保模型在丰富多样的数据集上训练。此外,还有基础架构的创新,比如新模型架构、多模态能力(融合文本、图像等)、成本与尺寸优化、模型对齐以及扩展上下文长度等。
应用层:
研究如何将LLM应用于现实世界。无论是特定领域的模型(如代码生成、文本转SQL或医疗应用),还是微调、检索增强生成(RAG)和多智能体系统等技术,这一层都是将理论转化为实用工具的关键。
输出层:
如何确保模型的输出靠谱?这一层的研究集中在评估方法上,从人机交互系统到基准测试和LLM评委,提供了多种有效评估AI输出的手段。
挑战:
生成式AI的局限性:对抗性攻击、模型可解释性、幻觉问题等,这些都是我们需要克服的现实挑战,以确保AI更安全、更可靠。
输入层
提示工程
-
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
-
The Prompt Report: A Systematic Survey of Prompting Techniques
数据模型层
1. 数据质量/合成数据生成
-
On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey
-
A Survey on Data Synthesis and Augmentation for Large Language Models
-
Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts
2. 新基座大模型
3. 模型优化 (大小, 成本)
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
-
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
-
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
-
LLM Pruning and Distillation in Practice: The Minitron Approach
4. 多模态
5. 大模型对齐
-
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
-
RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
-
The Capacity for Moral Self-Correction in Large Language Models
6. 长上下文
-
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
-
Evaluating Language Model Context Windows: A “Working Memory” Test and Inference-time Correction
-
YaRN: Efficient Context Window Extension of Large Language Models
-
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
应用层
1.领域模型
-
ChemCrow: Augmenting Large-Language Models with Chemistry Tools
-
MentaLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models
-
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?
-
PMC-LLaMA: Towards Building Open-Source Language Models for Medicine
-
A Survey of Large Language Models in Medicine: Progress, Application, and Challenge
-
Can Large Language Models Unlock Novel Scientific Research Ideas?
2. RAG
-
GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning
-
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
-
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
-
Retrieval-Augmented Generation for Large Language Models: A Survey
-
Searching for Best Practices in Retrieval-Augmented Generation
-
Seven Failure Points When Engineering a Retrieval Augmented Generation System
3. 智能体
-
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
-
A Taxonomy of AgentOps for Enabling Observability of Foundation Model based Agents
-
Toolformer: Language Models Can Teach Themselves to Use Tools
4. 多智能体
-
Emergent Autonomous Scientific Research Capabilities of Large Language Models
-
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents
-
AutoGen Studio: A No-Code Developer Tool for Building and Debugging Multi-Agent Systems
-
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework
-
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls
-
Large Language Model-Based Agents for Software Engineering: A Survey
5. 大模型微调
-
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models
-
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
-
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL
-
A Survey on Employing Large Language Models for Text-to-SQL Tasks
-
Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought
输出层
大模型评估
-
RAGEval: Scenario-Specific RAG Evaluation Dataset Generation Framework
-
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
-
PromptBench: A Unified Library for Evaluation of Large Language Models
挑战
生成式AI的局限性
-
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
-
Chain-of-Verification Reduces Hallucination in Large Language Models
-
One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era
标签:Language,AI,生成式,Large,Models,Survey,LLM,100 From: https://blog.csdn.net/yanqianglifei/article/details/145148568添加微信1185918903,关注公众号ChallengeHub获取更所咨询