首页 > 其他分享 >LMDeploy

LMDeploy

时间:2024-07-24 22:50:58浏览次数:13  
标签:inference com quantization https LMDeploy supports

LMDeploy

https://lmdeploy.readthedocs.io/en/latest/index.html

LMDeploy has the following core features:

  • Efficient Inference: LMDeploy delivers up to 1.8x higher request throughput than vLLM, by introducing key features like persistent batch(a.k.a. continuous batching), blocked KV cache, dynamic split&fuse, tensor parallelism, high-performance CUDA kernels and so on.

  • Effective Quantization: LMDeploy supports weight-only and k/v quantization, and the 4-bit inference performance is 2.4x higher than FP16. The quantization quality has been confirmed via OpenCompass evaluation.

  • Effortless Distribution Server: Leveraging the request distribution service, LMDeploy facilitates an easy and efficient deployment of multi-model services across multiple machines and cards.

  • Interactive Inference Mode: By caching the k/v of attention during multi-round dialogue processes, the engine remembers dialogue history, thus avoiding repetitive processing of historical sessions.

  • Excellent Compatibility: LMDeploy supports KV Cache Quant, AWQ and Automatic Prefix Caching to be used simultaneously.

 

Vs

https://bentoml.com/blog/benchmarking-llm-inference-backends

https://cloud.tencent.com/developer/article/2428575

 

部署参考

https://zhuanlan.zhihu.com/p/678685048

 

标签:inference,com,quantization,https,LMDeploy,supports
From: https://www.cnblogs.com/lightsong/p/18321961

相关文章

  • 使用 lmdeploy 部署 internlm/internlm2_5-7b-chat
    使用lmdeploy部署internlm/internlm2_5-7b-chat0.引言1.lmdeploy性能2.lmdeploy支持的模型3.快速开始0.引言LMDeploy由MMDeploy和MMRazor团队联合开发,是涵盖了LLM任务的全套轻量化、部署和服务解决方案。这个强大的工具箱提供以下核心功能:高效的......
  • 使用 lmdeploy 部署 Qwen/Qwen2-7B-Instruct
    使用lmdeploy部署internlm/internlm2_5-7b-chat0.引言1.lmdeploy性能2.lmdeploy支持的模型3.快速开始0.引言LMDeploy由MMDeploy和MMRazor团队联合开发,是涵盖了LLM任务的全套轻量化、部署和服务解决方案。这个强大的工具箱提供以下核心功能:高效的......
  • LMDeploy 量化部署
    LMDeploy简介LMDeploy是一个由MMDeploy和MMRazor团队联合开发的工具包,旨在为大型语言模型(LLM)提供全套的轻量化、部署和服务解决方案。以下是对LMDeploy的简介,采用分点表示和归纳的方式:核心功能:高效推理引擎TurboMind:基于FasterTransformer,实现了高效推理引擎TurboMind,......
  • LMDeploy量化部署LLM&VLM实践
    一、前提知识:大模型部署背景:什么是模型部署:部署面临的挑战:受Transformer架构影响:常见GPU算力还能一战,但是显存带宽受限严重,时间花费在数据交换上居多大模型部署方法:模型参数以定点数或整数形式存储,实际计算时,反量化为浮点数去计算,再用定点数和整数去存储计算结......
  • D5-LMDeploy 大模型量化部署
    〇、完成结果使用LMDeploy以本地对话部署InternLM-Chat-7B模型,生成300字的小故事:以API服务中的一种方式部署InternLM-Chat-7B模型,生成300字的小故事:以网页Gradio部署InternLM-Chat-7B模型,生成300字的小故事:前、知识笔记安装、部署、量化一、环境配置可以使用 vgpu-s......