train method
supervised fine-tuning
Reward Modeling
PPO training
DPO training
full-parameter
partial-parameter
LoRA
QLoRA
command parameter
fp16
gradient_accumulation_steps
lr_scheduler_type
lora_target
overwrite_cache
stage
标签:conception,tuning,parameter,explanation,factory,llama,fine From: https://www.cnblogs.com/ldzbky/p/17865028.html