在人工智能的浩瀚星辰中,大模型犹如璀璨的北极星,引领着技术的前沿方向。它们不仅代表了深度学习领域的最新突破,更成为了推动各行各业智能化转型的关键力量。本文笔者总结了大模型从理论研究到实战落地所需具备的所有知识干货,与大家分享~
基础知识
数学
- 深入浅出动态可视化数学之美(几何、微积分、概率论、线性代数等):https://space.bilibili.com/88461692/
机器学习
-
吴恩达机器学习入门:https://www.coursera.org/learn/machine-learning
-
scikit-learn官网:https://scikit-learn.org/stable/index.html
-
机器学习白板系列:https://www.yuque.com/bystander-wg876/yc5f72
-
机器学习实战:https://github.com/apachecn/AiLearning
-
南瓜书PumpkinBook:https://datawhalechina.github.io/pumpkin-book/
-
机器学习过程可视化:https://developers.google.cn/machine-learning/crash-course/feature-crosses/playground-exercises?hl=zh-cn
-
机器学习数据集仓库:https://archive.ics.uci.edu/
深度学习
-
跟李沐学AI:https://space.bilibili.com/1567748478?spm_id_from=333.337.0.0
-
台大李宏毅-机器学习:https://speech.ee.ntu.edu.tw/~hylee/ml/2023-spring.php
-
零基础入门深度学习:https://www.zybuluo.com/hanbingtao/note/433855
-
深度学习500问:https://github.com/scutan90/DeepLearning-500-questions
-
吴恩达深度学习课程笔记及资源:http://www.ai-start.com/dl2017/
-
简单粗暴TensorFlow 2:https://tf.wiki/zh_hans/
-
卷积过程可视化:https://poloclub.github.io/cnn-explainer/
自然语言处理NLP
-
斯坦福NLP:https://web.stanford.edu/class/cs224n/
-
牛津NLP:https://github.com/oxford-cs-deepnlp-2017/lectures
-
跟踪NLP当前最新技术进度的项目:https://github.com/yuquanle/NLP-progress
-
中文NLP相关资料:https://github.com/crownpku/awesome-chinese-nlp
强化学习
-
蘑菇书EasyRL:
-
https://datawhalechina.github.io/easy-rl/#/
-
动手学强化学习:
-
https://github.com/boyu-ai/Hands-on-RL/tree/main
-
强化学习框架
-
OpenRL:https://github.com/OpenRL-Lab/openrl
-
RLAssistant(RLA):https://github.com/xionghuichen/RLAssistant
-
PARL:https://github.com/PaddlePaddle/PARL
-
…
LLM训练
预训练PreTrain
-
BackBones:
-
https://github.com/FreedomIntelligence/LLMZoo
-
Transformer
-
图解Transformer:https://jalammar.github.io/illustrated-transformer/
-
详解Transformer原理:https://www.cnblogs.com/justLittleStar/p/17322172.html
-
Transformer模型Torch代码详解和训练实战:https://www.cnblogs.com/justLittleStar/p/17786071.html
-
BERT
-
BERT原理解析:https://www.cnblogs.com/justLittleStar/p/17322240.html
-
BERT可视化:https://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/
-
GPT
-
GPT原理解析:https://www.cnblogs.com/justLittleStar/p/17322259.html
-
图解GPT2:https://jalammar.github.io/illustrated-gpt2/
-
60行代码实现GPT推理:https://www.cnblogs.com/justLittleStar/p/17925108.html
-
T5:
-
https://huggingface.co/google/flan-t5-xxl
-
ChatGLM:
-
https://gitee.com/mindspore/mindformers/blob/dev/docs/model_cards/glm3.md
-
Baichuan:
-
https://gitee.com/mindspore/mindformers/blob/dev/research/baichuan2/baichuan2.md
-
Qwen:
-
https://zhuanlan.zhihu.com/p/690868924
-
https://zhuanlan.zhihu.com/p/702491999
-
https://huggingface.co/Qwen/Qwen-7B
-
Qwen2大模型微调:
-
Qwen部署到手机:
-
LLaMA
-
https://github.com/meta-llama/llama
-
LLaMA2训练、推理全流程:https://blog.csdn.net/qq_27149279/article/details/131981984
监督微调(Supervised Fine-Tuning, SFT)
训练
-
一站式训练工具
-
Firefly:https://github.com/yangjianxin1/Firefly
-
LLaMA-Factory:https://github.com/hiyouga/LLaMA-Factory
-
微调框架
-
Unsloth:https://github.com/yangjianxin1/unsloth
-
PEFT:https://github.com/huggingface/peft
-
分布式AI框架
-
https://github.com/microsoft/Megatron-DeepSpeed
-
【Megatron-DeepSpeed】张量并行工具代码mpu详解:https://blog.csdn.net/bqw18744018044/article/details/131741282
-
Megatron源码图图图图解之分布式概览与模型切分:https://zhuanlan.zhihu.com/p/678208105
-
https://github.com/microsoft/DeepSpeed
-
DeepSpeed:AllReduce与ZeRO-DP:https://zhuanlan.zhihu.com/p/610587671
-
https://github.com/NVIDIA/Megatron-LM
-
基于Megatron-LM从0到1完成GPT2模型预训练、模型评估及推理:https://juejin.cn/post/7259682893648724029
-
分布式训练原理及混合精度、DDP、DeepSpeed、Megatron-LM使用:https://zhuanlan.zhihu.com/p/647389318
-
https://pytorch.org/tutorials/beginner/dist_overview.html
-
PyTorch
-
Megatron-LM
-
DeepSpeed
-
Megatron-DeepSpeed
LLM微调
-
全量参数微调
-
基于DeepSpeed框架对ChatGLM-6B的流水线并行实战:
-
大模型训练DeepSpeed
-
https://zhuanlan.zhihu.com/p/636488690
-
https://blog.csdn.net/zwqjoy/article/details/130732601
-
https://zhuanlan.zhihu.com/p/688873027
-
DeepSpeed
-
高效参数微调
-
https://zhuanlan.zhihu.com/p/676998456
-
https://www.cnblogs.com/ting1/p/18217395
-
图解AdaLoRA:https://zhuanlan.zhihu.com/p/657130029
-
图解LoRA:https://zhuanlan.zhihu.com/p/646831196
-
https://lightning.ai/pages/community/tutorial/lora-llm/
-
https://martinlwx.github.io/zh-cn/lora-finetuning/
-
https://blog.csdn.net/qq_45038038/article/details/135324609
-
https://zhuanlan.zhihu.com/p/693737958
-
https://zhuanlan.zhihu.com/p/693737958
-
用BitFit进行大模型高效微调:https://blog.csdn.net/DeepLn_HPC/article/details/138122100
-
P-Tuningv2微调ChatGLM2-6B快速上手指南:https://zhuanlan.zhihu.com/p/645892136
-
P-Tuning v2
-
BitFit
-
Prefix Tuning
-
Prompt Tuning
-
Adapter Tuning
-
LoRA
-
AdaLoRA
-
QLoRA
分布式训练并行
-
数据并行
-
https://zhuanlan.zhihu.com/p/618865052
-
https://www.microsoft.com/en-us/research/blog/zero-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/
-
https://www.microsoft.com/en-us/research/blog/deepspeed-zero-a-leap-in-speed-for-llm-and-chat-model-training-with-4x-less-communication/
-
https://zhuanlan.zhihu.com/p/617133971
-
https://zhuanlan.zhihu.com/p/617133971
-
https://insujang.github.io/2022-06-11/parallelism-in-distributed-deep-learning/
-
DP(Data Parallel)
-
DDP(Distributed Data Parallel)
-
零冗余优化器ZeRO
-
模型并行:
-
https://zhuanlan.zhihu.com/p/613196255
-
https://zhuanlan.zhihu.com/p/622212228
-
一文捋顺千亿模型训练技术:流水线并行、张量并行和3D并行:https://zhuanlan.zhihu.com/p/617087561
-
https://huggingface.co/docs/transformers/v4.18.0/en/parallelism
-
张量并行TP
-
流水线并行PP
-
MOE并行/专家并行
-
https://blog.csdn.net/qq_46207024/article/details/129665922
-
https://blog.csdn.net/qq_27590277/article/details/136360290
-
https://www.paddlepaddle.org.cn/documentation/docs/en/guides/06_distributed_training/distributed_overview.html
-
https://blog.csdn.net/lovechris00/article/details/138734349
-
MindSpore分布式并行训练基础样例:
-
优化器相关的并行
-
https://blog.csdn.net/GarryWang1248/article/details/135340120
-
PyTorch分布式优化器:https://www.cnblogs.com/rossiXYZ/p/15664335.html
-
异构系统并行
-
https://blog.csdn.net/GarryWang1248/article/details/135340120
-
多维混合并行(算子并行、pipeline并行、MOE并行…)
-
https://blog.csdn.net/qq_51175703/article/details/136932579
-
自动并行:
-
https://zhuanlan.zhihu.com/p/662517647
LLM训练优化技术
-
一文读懂LLM训练加速技巧:https://zhuanlan.zhihu.com/p/649967866
-
I/O优化:FlashAttention V1、V2
-
算子优化:Nvidia CUDA operator
-
通信优化:ZeRO++、Onebit Adam、All-reduce Bucket、Overlap communication
-
显存优化:https://zhuanlan.zhihu.com/p/648924115
-
混合精度训练:
-
https://zhuanlan.zhihu.com/p/650549120
-
重计算
-
分析transformer模型的参数量、计算量、中间激活、KV cache:https://zhuanlan.zhihu.com/p/624740065
-
梯度累积
-
https://zhuanlan.zhihu.com/p/698787661
LLM压缩
量化(Quantization)
参考
-
模型量化理论+代码实战(LLM-QAT/GPTQ/BitNet 1.58Bits/OneBit):https://zhuanlan.zhihu.com/p/686161543
-
https://aistudio.baidu.com/projectdetail/3875525
量化对象
-
权重(weight)
-
激活(activation)
-
KV cache
-
梯度(Gradients)
量化形式
-
根据原始数据范围是否均匀:线性量化和非线性量化
-
根据量化参数s ss和z zz的共享范围(即量化粒度):逐层量化(per-tensor)、逐通道(per-token & per-channel)量化和逐组量化(per-group)
量化分类
根据应用量化压缩模型的阶段,可以将模型量化分为:
-
量化感知训练(Quantization Aware Training, QAT)
-
QLoRA(Quantized LoRA)详解:https://zhuanlan.zhihu.com/p/666234324
-
QLoRA、GPTQ:模型量化概述:https://zhuanlan.zhihu.com/p/646210009
-
https://github.com/facebookresearch/LLM-QAT
-
LLM-QAT:
-
QLoRA
-
PEQA
-
训练后量化(Post Training Quantization, PTQ)
-
SmoothQuant:
-
RPTQ
-
ZeroQuant-FP:https://zhuanlan.zhihu.com/p/683813769
-
https://arxiv.org/abs/2211.10438
-
https://zhuanlan.zhihu.com/p/627436535
-
https://zhuanlan.zhihu.com/p/646210009
-
GPTQ-for-LLaMa 量化分析和优化:https://zhuanlan.zhihu.com/p/625701227
-
LUT-GEMM:
-
LLM.int8():
-
ZeroQuant:
-
GPTQ:
-
AWQ:
-
INT4/INT8
-
https://blog.csdn.net/weixin_42764932/article/details/131230429
-
https://zhuanlan.zhihu.com/p/627436535
-
大模型量化技术原理-ZeroQuant系列:https://zhuanlan.zhihu.com/p/683813769
-
https://zhuanlan.zhihu.com/p/690673432
-
https://arxiv.org/abs/2210.17323
-
使用GPTQ的4位LLM量化:
-
https://arxiv.org/abs/2306.00978
-
大语言模型的模型量化(INT8/INT4)技术:https://zhuanlan.zhihu.com/p/627436535
-
权重量化
-
全量化(权重和激活量化)
剪枝(Pruning)
-
深度学习的模型压缩与加速(万字长文带你入门):https://blog.csdn.net/weixin_54338498/article/details/127588261
-
万字长文谈深度神经网络剪枝综述:https://zhuanlan.zhihu.com/p/692858636
-
模型压缩-剪枝算法详解:https://zhuanlan.zhihu.com/p/622519997
知识蒸馏(Knowledge Distillation)
- 知识蒸馏Knowledge Distillation学习一条龙:https://zhuanlan.zhihu.com/p/696383649
低秩分解(Low-Rank Factorization)
-
https://blog.csdn.net/qq_51175703/article/details/138320834
-
https://zhuanlan.zhihu.com/p/628232317
LLM编译
编译框架
-
MLIR
-
利用TPU-MLIR实现LLM INT8量化部署:https://zhuanlan.zhihu.com/p/654828412
-
XLA:https://github.com/openxla/xla
-
TVM:https://tvm.hyper.ai/docs
LLM推理
模型推理部署/服务化方式
-
服务器端部署
-
边缘设备端部署
-
云端部署
-
Web端部署
模型推理服务化工具
-
通过WEB框架封装AI模型提供服务
-
https://www.tornadoweb.org/en/stable/
-
https://dormousehole.readthedocs.io/en/latest/
-
https://blog.csdn.net/chinesehuazhou2/article/details/114297858
-
Sanic
-
Flask
-
Tornado
-
使用深度学习框架自带的Serving封装
-
https://zhuanlan.zhihu.com/p/616740782
-
TensorFlow Serving:https://github.com/tensorflow/serving
-
TorchServe:https://pytorch.org/serve/
-
MindSpore Serving:https://gitee.com/mindspore/serving
-
支持多种框架的统一推理服务化工具
-
https://www.hiascend.com/software/mindie
-
https://developer.nvidia.cn/triton-inference-server
-
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
-
https://docs.llamaindex.ai/en/stable/examples/llm/nvidia_triton/
-
Triton Inference Server
-
MindIE-Service:
推理加速框架
-
TensorRT-LLM:
-
https://developer.nvidia.com/zh-cn/blog/tune-and-deploy-lora-llms-with-nvidia-tensorrt-llm/
-
https://blog.csdn.net/kunhe0512/article/details/138286905?spm=1001.2014.3001.5502
-
https://github.com/NVIDIA/TensorRT-LLM/tree/v0.5.0
-
TensorRT-LLM和Triton推理服务器使用和部署Llama3:
-
使用NVIDIA TensorRT-LLM调整和部署LoRA LLM:
-
vLLM:
-
https://zhuanlan.zhihu.com/p/691045737
-
https://github.com/vllm-project/vllm
-
vLLM源码解析:
-
Llama.cpp:
-
https://blog.csdn.net/weixin_51717597/article/details/134343802
-
https://github.com/ggerganov/llama.cpp/tree/master/examples/main
-
Llama2通过llama.cpp模型量化&部署:
-
HuggingFace TGI:
-
https://github.com/huggingface/text-generation-inference
-
FasterTransformer:
-
https://zhuanlan.zhihu.com/p/626008090
-
https://github.com/NVIDIA/FasterTransformer
-
浅析FasterTransformer:
-
https://github.com/NVIDIA/FasterTransformer/blob/main/docs/gpt_guide.md
-
DeepSpeed
-
https://zhuanlan.zhihu.com/p/629644249
-
DeepSpeed通过系统优化加速大模型推理:
-
DeepSpeed-MII:
-
https://github.com/microsoft/DeepSpeed-MII
-
LMDeploy:
-
https://github.com/InternLM/Tutorial/blob/7c2a385cd772ed93965927599b0159c52068da85/lmdeploy/lmdeploy.md
-
https://blog.csdn.net/weixin_61573157/article/details/137782082
-
https://github.com/InternLM/lmdeploy
-
https://lmdeploy.readthedocs.io/zh-cn/latest/index.html
-
https://blog.csdn.net/weixin_42475060/article/details/135386145
-
LMDeploy量化部署LLM&VLM实战:
-
LMDeploy量化和部署:
-
MindFormers:
-
https://gitee.com/mindspore/mindformers
-
MindIE:
-
https://www.hiascend.com/software/mindie
-
MindSpore Lite:
-
https://www.mindspore.cn/lite
推理优化
-
KV Cache
-
https://zhuanlan.zhihu.com/p/662498827
-
https://zhuanlan.zhihu.com/p/685853516
-
https://zhuanlan.zhihu.com/p/679249229
-
图解大模型推理优化KV Cache:
-
大模型百倍推理加速之KV cache篇:
-
大模型推理加速:看图学KV Cache:
-
Flash attention
-
https://zhuanlan.zhihu.com/p/642412124
-
FlashAttention V1-从硬件到计算逻辑:https://zhuanlan.zhihu.com/p/669926191
-
图解大模型计算加速系列:
-
LLM的推理优化:
-
MQA/GQA
-
https://blog.csdn.net/qq128252/article/details/138704958
-
MHD、MQA、GQA注意力机制详解:
LLM部署环境
集群
通信
「通信硬件」
-
NVLink
-
https://mp.weixin.qq.com/s/itIi3FvUiMsGhMR2ou5Syw
-
一文读懂:多卡GPU是如何互联通信的:
-
https://www.nvidia.com/en-us/data-center/nvlink/
-
NVMe SSD
-
https://zhuanlan.zhihu.com/p/672098336
-
AI集群基础设施NVMe SSD详解:
-
InfiniBand
-
https://zhuanlan.zhihu.com/p/673903240
「通信软件(NCCL\HCCL…)」
-
https://pytorch.org/tutorials/intermediate/dist_tuto.html#collective-communication
-
https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/collectives.html
-
用这拌元宵,一个字:香!| 分布式训练硬核技术——通讯原语:https://blog.csdn.net/Kenji_Shinji/article/details/125292757
「通信网络监控」
-
nvbandwidth:https://github.com/NVIDIA/nvbandwidth
-
DCGM:https://docs.nvidia.com/datacenter/dcgm/latest/user-guide/feature-overview.html
平台
- Kubernetes:https://www.seldon.io/deploying-machine-learning-models-on-kubernetes
AI芯片
-
英伟达GPU
-
谷歌TPU
-
华为昇腾NPU
-
百度昆仑芯
-
寒武纪思元
-
阿里平头哥含光
显存
-
深度学习训练过程显存占用分析及优化:https://saikr.com/a/533227
-
深入解析大语言模型显存占用:https://blog.csdn.net/qq_43592352/article/details/137055671
-
混合精度训练与显存分析:https://baiqw.blog.csdn.net/article/details/131030255
LLM应用开发
开发框架
-
Langchain:
-
https://zhuanlan.zhihu.com/p/665503140
-
https://github.com/langchain-ai/langchain
-
理论+实践详解最热的LLM应用框架LangChain:
-
LangChain Agent原理解析:https://blog.csdn.net/2301_78285120/article/details/135303183
-
Hugging Face:
-
https://github.com/huggingface