写在前面
原生vllm并不支持热添加lora,但是考虑到微调机微调后,需要在不停机的情况下传递lora,于是我们需要增加一个逻辑
修改VLLM包中的vllm/entrypoints/openai/api_server.py文件,添加下方的代码:
1 from pydantic import BaseModel 2 3 class AddLoraRequest(BaseModel): 4 lora_name: str 5 lora_local_path: str 6 7 @app.get("/add_lora") 8 async def add_lora(request: AddLoraRequest): 9 openai_serving_chat.add_lora(request.lora_name, request.lora_local_path) 10 return Response(status_code=200)
标签:request,add,添加,AddLoraRequest,vllm,lora From: https://www.cnblogs.com/alphainf/p/18227171