使用xinference部署自定义embedding模型（docker）

说明：

首次发表日期：2024-08-27
官方文档： https://inference.readthedocs.io/zh-cn/latest/index.html

使用docker部署xinference

FROM nvcr.io/nvidia/pytorch:23.10-py3

# Keeps Python from generating .pyc files in the container
ENV PYTHONDONTWRITEBYTECODE=1

# Turns off buffering for easier container logging
ENV PYTHONUNBUFFERED=1

RUN python3 -m pip uninstall -y transformer-engine
RUN python3 -m pip install --upgrade pip


RUN python3 -m pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --no-cache-dir --index-url https://download.pytorch.org/whl/cu121

# If there are network issue, you can download torch whl file and use it
# ADD torch-2.3.0+cu121-cp310-cp310-linux_x86_64.whl /root/torch-2.3.0+cu121-cp310-cp310-linux_x86_64.whl
# RUN python3 -m pip install /root/torch-2.3.0+cu121-cp310-cp310-linux_x86_64.whl


RUN python3 -m pip install packaging setuptools==69.5.1 --no-cache-dir -i https://mirror.baidu.com/pypi/simple
RUN python3 -m pip install -U ninja --no-cache-dir -i https://mirror.baidu.com/pypi/simple
RUN python3 -m pip install flash-attn==2.5.8 --no-build-isolation --no-cache-dir
RUN python3 -m pip install "xinference[all]" --no-cache-dir -i https://repo.huaweicloud.com/repository/pypi/simple

EXPOSE 80

CMD ["sh", "-c", "tail -f /dev/null"]

构建镜像

docker build -t myxinference:latest .

参照 https://inference.readthedocs.io/zh-cn/latest/getting_started/using_docker_image.html#mount-your-volume-for-loading-and-saving-models 部署docker服务

另外，如果使用huggingface的话，建议使用 https://hf-mirror.com/ 镜像（记得docker部署时设置HF_ENDPOINT环境变量）。

以下假设部署后的服务地址为 http://localhost:9997

部署自定义 embedding 模型

准备embedding模型自定义JSON文件

创建文件夹custom_models/embedding：

mkdir -p custom_models/embedding

然后创建以下模型自定义JSON文件：

360Zhinao-search.json:

{
    "model_name": "360Zhinao-search",
    "dimensions": 1024,
    "max_tokens": 512,
    "language": ["en", "zh"],
    "model_id": "qihoo360/360Zhinao-search",
    "model_format": "pytorch"
}

gte-Qwen2-7B-instruct.json：

{
    "model_name": "gte-Qwen2-7B-instruct",
    "dimensions": 4096,
    "max_tokens": 32768,
    "language": ["en", "zh"],
    "model_id": "Alibaba-NLP/gte-Qwen2-7B-instruct",
    "model_format": "pytorch"
}

zpoint_large_embedding_zh.json:

{
    "model_name": "zpoint_large_embedding_zh",
    "dimensions": 1792,
    "max_tokens": 512,
    "language": ["zh"],
    "model_id": "iampanda/zpoint_large_embedding_zh",
    "model_format": "pytorch"
}

注意：对于下载到本地的模型可以设置 model_uri参数，例如 file:///path/to/llama-2-7b。

注册自定义 embedding 模型

xinference register --model-type embedding --file custom_models/embedding/360Zhinao-search.json --persist --endpoint http://localhost:9997

xinference register --model-type embedding --file custom_models/embedding/gte-Qwen2-7B-instruct.json --persist --endpoint http://localhost:9997

xinference register --model-type embedding --file custom_models/embedding/zpoint_large_embedding_zh.json --persist --endpoint http://localhost:9997

启动自定义 embedding 模型

xinference launch --model-type embedding --model-name gte-Qwen2-7B-instruct --model-engine transformers  --model-format pytorch --endpoint http://localhost:9997

xinference launch --model-type embedding --model-name 360Zhinao-search --model-engine transformers  --model-format pytorch --endpoint http://localhost:9997

xinference launch --model-type embedding --model-name zpoint_large_embedding_zh --model-engine transformers  --model-format pytorch --endpoint http://localhost:9997

启动bge-m3和bge-reranker-base模型

bge-m3和bge-reranker-base是比较常用的embedding模型和reranking模型。

xinference launch --model-name bge-m3 --model-type embedding --endpoint http://localhost:9997

xinference launch --model-name bge-reranker-base --model-type rerank --endpoint http://localhost:9997

curl调用测试

embedding:

curl http://localhost:9997/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": "The food was delicious and the waiter...",
    "model": "360Zhinao-search",
    "encoding_format": "float"
  }'

reranking:

curl http://localhost:9997/v1/rerank \
  -H "Content-Type: application/json" \
  -d '{
  "model": "bge-reranker-base",
  "query": "I love you",
  "documents": [
    "I hate you",
    "I really like you",
    "天空是什么颜色的",
    "黑芝麻味饼干"
  ],
  "top_n": 3
}'

标签：自定义,--,xinference,9997,embedding,model,localhost
From： https://www.cnblogs.com/shizidushu/p/18382260

新手专科准大一学习c语言的第10天之strcpy、memset、自定义函数的学习与应用
strcpystrcpy是C语言标准库中的一个字符串操作函数，用于将源字符串复制到目标字符串中。#include<stdio.h>#include<string.h>intmain(){chararr1[50];//确保目标数组足够大，能够容纳源字符串chararr2[]="helloworld";//源字符串......
学懂C++（四十四）：C++ 自定义内存管理的深入解析：内存池与自定义分配器
目录1.内存池（MemoryPool）概念模型特点核心点实现适用场景经典示例实现代码解析2.自定义分配器（CustomAllocators）概念模型特点核心点实现适用场景经典示例实现代码解析高级自定义分配器示例代码解析总结 C++作为一种高性能编程语言，在......
vue element-ui表格table 表格动态添加行、删除行、添加列、删除列自定义表头
vuetable表格动态添加行、删除行、添加列、删除列自定义表头；增加一行、删除一行、添加一列、删除一列；每行带输入框input代码1、HTML部分：<template><divclass="app-container"><el-table:data="tableData"borderstyle="width:600px;margin-to......
DNF台服自定义apc斗蛐蛐归纳
目录结构 action.ai#PVF_File[aipattern][think][void]`istargetinattackarea()`150.0150.0100.0100.0[true][think][void]`checkrandom()`30100......
18-神经网络-自定义带参数的层
1、nn.Parameter函数2、torch.mm和torch.matmul区别都是PyTorch中用于矩阵乘法的函数，但它们在使用上有细微的差别importtorchimporttorch.nnasnnimporttorch.nn.functionalasFclassMyLinear(nn.Module):def__init__(self,in_units,out_units):......
SpringBoot自定义校验
通常情况，后端在业务层需要进行大量校验，写在业务层又不美观，而且需要重复编写，很是不方便，Spring提供的校验注解有时无法满足我们的需求，需要自定义校验规则，以校验手机号为例，下面开始自定义校验一、引入依赖引入Spring校验依赖包<dependency> <groupId>org.springframework......
【Material-UI】深入探讨Radio Group组件的自定义功能
文章目录一、RadioGroup组件概述1.组件介绍2.自定义的重要性二、RadioGroup组件的自定义1.样式定制示例2.代码详解3.样式自定义的注意事项三、如何利用自定义功能提升用户体验1.提升视觉一致性2.增强可用性3.实现更灵活的布局四、总结Material-UI是R......
使用Java导出MySQL数据：支持自定义分隔符的TXT文件生成
在软件开发中，我们经常需要将数据库中的数据导出为文件，给关联系统做传输或者进行数据备份、迁移或分析。常见的导出格式包括CSV和TXT文件，分隔符可以是逗号、制表符或其他符号。本文将深入探讨如何使用Java从MySQL数据库中导出数据，并支持用户自定义分隔符来生成格式化的TXT文件。......
PyQt5 / PySide 2 + Pywin32 自定义标题栏窗口 + 还原 Windows 原生窗口边框特效（2）
前言：已修复上一篇文章中提到的Bug，增加状态切换动画：PyQt5/PySide2+Pywin32自定义标题栏窗口+还原Windows原生窗口边框特效-CSDN博客https://blog.csdn.net/2402_84665876/article/details/141487635?spm=1001.2014.3001.5501仍然存在的问题：打开窗口时窗口标题栏......
elasticsearch整合自定义词库实现自定义分词
在进行分词时es有时没有办法对人名或者其他新词、偏词进行精准的分词，这时候就需要我们进行自定义分词。前置： 1).安装完成ik分词器，安装流程可以参考：ik安装流程 2).自定义的远程库我们使用nginx，所以需要提前安装nginx1.进入到......

使用xinference部署自定义embedding模型（docker）

使用xinference部署自定义embedding模型（docker）

说明：

使用docker部署xinference

部署自定义 embedding 模型

准备embedding模型自定义JSON文件

注册自定义 embedding 模型

启动自定义 embedding 模型

启动bge-m3和bge-reranker-base模型

curl调用测试

相关文章

赞助商

阅读排行