1.比较 LLaMA、ChatGLM、Falcon 等大语言模型的细节:tokenizer、位置编码、Layer Normalization、激活函数等。
2. 大语言模型的分布式训练技术:数据并行、张量模型并行、流水线并行、3D 并行、零冗余优化器 ZeRO、CPU 卸载技术 ZeRo-offload、混合精度训练、激活重计算技术、Flash Attention、Paged Attention。
3. 大语言模型的参数高效微调技术:prompt tuning、prefix tuning、adapter、LLaMA-adapter、 LoRA。
0. 大纲
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151222928-290350821.png)
1. 大语言模型的细节
1.0 transformer 与 LLM
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151223014-1815729989.png)
1.1 模型结构
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151223000-192890768.png)
1.2 训练目标
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151222865-104255699.png)
1.3 tokenizer
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151223078-1956213140.png)
1.4 位置编码
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151223021-955573907.png)
1.5 层归一化
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151222976-1562615439.png)
1.6 激活函数
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151222931-1922653115.png)
1.7 Multi-query Attention 与 Grouped-query Attention
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151223121-99462710.png)
1.8 并行 transformer block
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151222887-452245893.png)
1.9 总结-训练稳定性
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151223009-49151668.png)
2. LLM 的分布式预训练
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151223007-893079150.png)
2.0 点对点通信与集体通信
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151222932-2060857497.png)
2.1 数据并行
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151223169-1476511031.png)
2.2 张量并行
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151223149-1097493661.png)
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151222976-420961760.png)
2.3 流水线并行
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151222964-1538788396.png)
2.4 3D 并行
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151223127-1247829558.png)
2.5 混合精度训练
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151222979-1713988412.png)
2.6 激活重计算
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151223162-1840412893.png)
2.7 ZeRO,零冗余优化器
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151223159-1947148169.png)
2.8 CPU-offload,ZeRO-offload
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151222968-605870877.png)
2.9 Flash Attention
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151223184-1694667364.png)
2.10 vLLM: Paged Attention
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151223052-1681526166.png)
3. LLM 的参数高效微调
3.0 为什么进行参数高效微调?
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151222956-843917322.png)
3.1 prompt tuning
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151223070-1150689920.png)
3.2 prefix tuning
3.3 adapter
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151222936-421741705.png)
3.4 LLaMA adapter
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151223094-857330200.png)
3.5 LoRA
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151223046-1991415483.png)
3.6 实验比较
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151222936-646819174.png)
4. 参考文献
![图片](/i/l/?n=23&i=blog/27422/202309/27422-20230920151223055-823325435.png)
-
分析 transformer 模型的参数量、计算量、中间激活、KV cache -
【万字长文】LLaMA, ChatGLM, BLOOM 的高效参数微调实践 -
FlashAttention:加速计算,节省显存, IO 感知的精确注意力
作者:spring
标签:tuning,模型,Attention,并行,主流,细节,LLaMA,adapter From: https://www.cnblogs.com/88223100/p/Technical_principles_and_details_of_mainstream_large_la