multiple GPUs in single node
click to view the code
torchrun --nnodes 1 --nproc_per_node 2 examples/finetuning.py --enable_fsdp --use_peft --peft_method lora --dataset medcqa_dataset --model_name meta-llama/Llama-2-7b-hf --fsdp_config.pure_bf16 --output_dir ./FINE/llama2-7b-medcqa-modify-prompt-modi-Q-T-O-multiple-gpus
please note single GPU couldn't hold even llama7B model, and 4xA6000 GPUs couldn't hold llama2 13B too.
export the model
colck to view the code
from transformers import AutoModelForCausalLM
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")
peft_model_id = "/home/ludaze/Docker/Llama/llama-recipes/FINE/llama2-7b-medcqa-modify-prompt-modi-Q-T-O-multiple-gpus"
model = PeftModel.from_pretrained(base_model, peft_model_id)
merged_model = model.merge_and_unload()
merged_model.save_pretrained("/home/ludaze/Docker/Llama/llama-recipes/FINE_EXPORT/llama2-7b-medcqa-modify-prompt-modi-Q-T-O-multiple-gpus-export")