vLLM

https://github.com/vllm-project/vllm

https://docs.vllm.ai/en/latest/

推理和服务，但是更加偏向推理。

vLLM is a fast and easy-to-use library for LLM inference and serving.

vLLM is fast with:

State-of-the-art serving throughput

Efficient management of attention key and value memory with PagedAttention

Continuous batching of incoming requests

Fast model execution with CUDA/HIP graph

Quantization: GPTQ, AWQ, SqueezeLLM, FP8 KV Cache

Optimized CUDA kernels

Performance benchmark: We include a performance benchmark that compares the performance of vllm against other LLM serving engines (TensorRT-LLM, text-generation-inference and lmdeploy).

vLLM is flexible and easy to use with:

Seamless integration with popular Hugging Face models

High-throughput serving with various decoding algorithms, including parallel sampling, beam search, and more

Tensor parallelism and pipeline parallelism support for distributed inference

Streaming outputs

OpenAI-compatible API server

Support NVIDIA GPUs, AMD CPUs and GPUs, Intel CPUs and GPUs, PowerPC CPUs

(Experimental) Prefix caching support

(Experimental) Multi-lora support

vLLM seamlessly supports most popular open-source models on HuggingFace, including:

Transformer-like LLMs (e.g., Llama)

Mixture-of-Expert LLMs (e.g., Mixtral)

Multi-modal LLMs (e.g., LLaVA)

Find the full list of supported models here.

FastChat

https://github.com/lm-sys/FastChat

对模型的训练、服务、评估负责，

流行的还是使用其服务功能，即部署功能（分布式部署，提供webui 和 resetapi），切后端可以集成vLLM加速推理。

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

FastChat

| Demo | Discord | X |

FastChat is an open platform for training, serving, and evaluating large language model based chatbots.

FastChat powers Chatbot Arena (https://chat.lmsys.org/), serving over 10 million chat requests for 70+ LLMs.

Chatbot Arena has collected over 500K human votes from side-by-side LLM battles to compile an online LLM Elo leaderboard.

FastChat's core features include:

The training and evaluation code for state-of-the-art models (e.g., Vicuna, MT-Bench).

A distributed multi-model serving system with web UI and OpenAI-compatible RESTful APIs.

https://rudeigerc.dev/posts/llm-inference-with-fastchat/

VS

https://fastchat.mintlify.app/vllm_integration

https://github.com/lm-sys/FastChat/issues/1775

标签：vllm,serving,LLM,fastchat,FastChat,vs,https,vLLM
From： https://www.cnblogs.com/lightsong/p/18312952

keepalived绑定单播地址、非抢占模式及LVS的TCP模式的高可用【转】
背景：keepalived默认是组播地址进行播放，且默认地址是224.0.0.18，如果配置多个keepalived主机，会导致虚拟IP地址存在冲突问题，这种问题怎么解决呢？解决办法：就是将keepalived主机的多播地址修改为单播地址，绑定固定IP地址，避免在多播模式下，通过VRRP进行广播地址，造成IP地址地址冲突。vrrp_......
lvs的nat和dr模式混合用
lvs：10.0.0.200 vip 10.0.0.19外网IP，172.168.1.19内网IP drrs:10.0.0.200vip 10.0.0.18rip natrs:172.168.1.17rip 客户端：10.0.0.14cip lvs机器： ipaddradd10.0.0.200/24devens33:0 IP：[root@mcw09~]#ipa1:lo:<LOOPBACK,UP,......
ORACLE vs MySQL 对组合索引包含IN LIST执行计划研究(ORACLE部分)_PART1
本文主要研究下组合索引包含in条件（多个值），在单表查询，关联查询这两种SQL查询结果在ORACLE和MySQL里的区别。ORACLE具有强大的优化器，一般来说，组合索引在ORACLE里不管是单表还是关联查询，都可以选择optimal的执行计划，只要统计信息等是准确的。MySQL的优化器相对来说，要弱不少，很多功能不......
Windows右键新建Markdown文件类型配置 | Typora | VSCode
......
参加DevSecOps认证培训，掌握安全高效的软件开发
在当今快速发展的数字化时代，软件安全已成为企业关注的焦点。DevSecOps，这一融合了开发（Dev）、安全（Sec）和运维（Ops）的创新实践，正引领着软件开发的新潮流。现在，我们为您带来了一门全面的DevSecOps认证课程，旨在帮助您和您的团队在保障安全的同时，提升软件开发的效率和质量。一、DevSecOp......
VS Code 开发工具的基本使用
VSCode开发工具的基本使用VSCode（VisualStudioCode）是微软开发的一款免费、开源的代码编辑器，它支持多种操作系统，包括Windows、macOS和Linux。VSCode因其轻量级、快速、功能强大以及丰富的插件生态系统而广受开发者喜爱，特别是在前端开发领域，VSCode提供了许多便捷的工具和插件......
通过vllm 部署qwen2 模型
主要是一个简单测试安装vllmpip模式安装部分包比较大，注意时间,最好使用一个加速,目前阿里云的似乎有限速了，可以试试清华的https://pypi.tuna.tsinghua.edu.cn/simplepython-mvenvvenvsourcevenv/bin/acsourcevenv/bin/activatepipinstall-ih......
网站开发：使用VScode安装yarn包和运行前端项目
一、首先打开PowerShell-管理员身份运行ISE输入命令：set-ExecutionPolicyRemoteSigned选择“全是”，表示允许在本地计算机上运行由本地用户创建的脚本，没有报错就行了二、接着打开VScode集成终端输入npminstall-gyarn再次输入以下命令，无报错说明yarn安装成功ya......
调试合集（内含VS快捷键）
目录一、BUG二、调试什么是调试？调试过程调试工具和技术三、VS常用快捷键四、VS中的常用调试功能（附快捷键）监视窗口自动窗口和局部变量内存窗口调用堆栈五、Debug实例求n的阶乘之和一、BUG在当今互联网盛行的时代无论你是否从事IT相关的行业或者学习，都应该听说过bug......
LVS介绍
一、什么是集群通过高速网络将恨过服务器集中起来提供同一种服务，在客户端看起来就像是只有一个服务器可以在付出较低成本的情况下获得在性能、可靠性、灵活性方面的相对较高的收益任务调度是集群系统中的核心技术集群目的提高性能如计算密集型应用，如：天气预报......

fastchat vs vLLM

vLLM

FastChat

FastChat

VS

相关文章

赞助商

阅读排行