千问开源的版本挺多,版本有1和1.5,参数有1.8~72B,模态有语言、语音、视觉。72B就有Qwen-72b-chat(聊天)和Qwen-72b(基础/预训练)两个版本,以下为简单的Qwen-72b-chat的坑:
1、下载模型(魔塔社区),权重文件140+G
2、新建虚拟环境,基础要求:python>3.8、pytorch>1.12、cuda>11.4;依赖:"transformers>=4.32.0" accelerate tiktoken einops scipy transformers_stream_generator==0.0.4 peft deepspeed。缺少一些操作系统级的组件会导致后面各种错误:glibc-devel、gcc、gcc-c++。还要注意的就是PATH环境,没有/usr/sbin/ldconfig就会有问题。
3、作为服务器运行:
- 下载框架:https://github.com/QwenLM/Qwen#vllm
- 使用openai模式提供服务(假定模型保存在/app/model/Qwen-72B-Chat):python3 openai_api.py -c /app/model/Qwen-72B-Chat --server-name 0.0.0.0
4、远程调用:
- http方式:
import requests import json # 你的OpenAI API密钥 OPEN_AI_API_KEY = 'none' # 示例:调用Chat Completion API endpoint_url = "http://192.168.1.2:8000/v1/chat/completions" # 请求体参数 request_body = { "model": "Qwen-72b-Chat", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "你好"} ] } # 添加请求头 headers = { "Content-Type": "application/json", "Authorization": f"Bearer {OPEN_AI_API_KEY}" } # 发送POST请求 response = requests.post(endpoint_url, headers=headers, json=request_body) # 检查请求是否成功 if response.status_code == 200: # 解析返回的JSON数据 result = response.json() print(result) else: print(f"请求失败,状态码:{response.status_code}") print(f"错误详情:{response.text}")
- openai方式(openai 0.28.1版本),必须低于1.0。暂未搞定流式
import openai openai.api_base = "http://192.168.1.2:8000/v1" openai.api_key = "none" response = openai.ChatCompletion.create( model="Qwen-72B-Chat", messages=[ {"role": "user", "content": "你好"} ], stream=False, stop=[] # You can add custom stop words here, e.g., stop=["Observation:"] for ReAct prompting. ) print(response.choices[0].message.content)