从Hugging Face下载模型到本地并调用

时间：2023-12-04 15:36:45浏览次数：44

标签：调用 torch Hugging Face cuda print path model dir

不同的模型需要的显存不同，下载前先查一下自己GPU能支持什么模型

1. 用如下脚本可以下载HuggingFace上的各种模型，网址 https://huggingface.co/models

download.py

#coding=gbk
import time
from huggingface_hub import snapshot_download
#huggingface上的模型名称
repo_id = "LinkSoul/Chinese-Llama-2-7b-4bit"
#本地存储地址
local_dir = "E:\\work\\AI\\GPT\\llama_model_7b_4bit"
cache_dir = local_dir + "\\cache"
while True:
    try:
        snapshot_download(cache_dir=cache_dir,
        local_dir=local_dir,
        repo_id=repo_id,
        local_dir_use_symlinks=False,
        resume_download=True,
        allow_patterns=["*.model", "*.json", "*.bin",
        "*.py", "*.md", "*.txt"],
        ignore_patterns=["*.safetensors", "*.msgpack",
        "*.h5", "*.ot",],
        )
    except Exception as e :
        print(e)
        # time.sleep(5)
    else:
        print('下载完成')
        break

2. 本地环境

要运行下载的llama模型需要先创建conda虚拟环境，博主是在windows机器上安装了anaconda，创建一个虚拟环境，命令行输入

conda create -n LLM_env python=3.10

这里python版本选择了3.10，后面要跟pytorch对应上

3. 查看cuda版本

4. 安装pytorch，参考网页

https://blog.csdn.net/threestooegs/article/details/119531414

和

https://pytorch.org/get-started/previous-versions/

博主安装的是1.13.1这个版本

当然，还可能缺少一些其他的包，就看少什么装什么吧。这边可能有个坑，windows安装bitsandbytes库的问题

importlib.metadata.PackageNotFoundError: No package metadata was found for bitsandbytes

bitsandbytes-windows版本过低，重新安装高版本

pip install --trusted-host github.com --trusted-host objects.githubusercontent.com https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.0-py3-none-win_amd64.whl

5. pycharm配置

pycharm上新建项目，interpreter选择刚创建的虚拟环境

6. 编写测试代码

test.py

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer

#本地模型路径
# model_path = "E:\\work\\AI\\GPT\\llama_model"
model_path = "E:\\work\\AI\\GPT\\llama_model_4bit"
# model_path = "E:\\work\\AI\\GPT\\llama_model_7b_8bit"
print(torch.cuda.is_available())

if torch.cuda.is_available():
    print(torch.cuda.device_count())
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print(device)
else:
    print('没有GPU')

tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
if model_path.endswith("4bit"):
    model = AutoModelForCausalLM.from_pretrained(
            model_path,
            load_in_4bit=True,
            torch_dtype=torch.float16,
            device_map='auto'
        )
elif model_path.endswith("8bit"):
        model = AutoModelForCausalLM.from_pretrained(
            model_path,
            load_in_8bit=True,
            torch_dtype=torch.float16,
            device_map='auto'
        )
else:
    model = AutoModelForCausalLM.from_pretrained(model_path).half().cuda()
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

instruction = """[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

            If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n<</SYS>>\n\n{} [/INST]"""

prompt = instruction.format("Hello, what the meaning of life？")
generate_ids = model.generate(tokenizer(prompt, return_tensors='pt').input_ids.cuda(), max_new_tokens=4096, streamer=streamer)

webui

还有一种本地运行的方法，是网页形式的

参考：

https://www.cnblogs.com/zhizhixiaoxia/p/17414798.html

https://github.com/oobabooga/text-generation-webui/tree/main

标签：调用,torch,Hugging,Face,cuda,print,path,model,dir
From： https://www.cnblogs.com/scarecrow-blog/p/17875042.html

【python笔记】subprocess，调用外部程序
importsubprocesssubprocess.run("notepad")将会打开记事本。如果当前路径下有个叫test.txt，而想用记事本打开这个文本文件：importsubprocesssubprocess.run(["notepad","test.txt"])执行cmd命令：importsubprocesscmd="echoI'mhandsome"subpro......
Facebook营销的社群运营技巧与经验分享
Facebook营销的社群运营技巧与经验分享导语：在当今数字化的时代，Facebook已成为了一种广泛使用的社交媒体平台。对于企业而言，利用Facebook进行营销和社群运营是一种重要的策略。本文将分享一些关于Facebook营销的社群运营技巧和经验，希望能够帮助您提升品牌影响力，吸引更多目标受......
连接huggingface.co报错：(MaxRetryError("SOCKSHTTPSConnectionPool(host='huggingface
参考：https://blog.csdn.net/shizheng_Li/article/details/132942548https://blog.csdn.net/weixin_42209440/article/details/129999962 ============================ 随着国际顶级的AI公司广泛使用huggingface.co，现在的huggingface.co已经成了搞AI的不可或缺的一......
【文档翻译】__cdecl/__stdcall/__fastcall？解开神秘的调用约定！
本文档译自www.codeproject.com的文章"CallingConventionsDemystified"，作者NemanjaTrifunovic，原文参见此处引言-Introduction在学习Windows编程的漫长、艰难而美妙的旅途中，你可能会对函数声明前出现的奇怪说明符感到好奇，比如__cdecl、__stdcall、__fastcall、WINAP......
华为云IotDA平台与OBS进行数据转发并使用ECS服务器完成数据调用
华为云IotDA平台与OBS进行数据转发并使用ECS服务器完成数据调用一、通过IotDA平台接入物联网设备参考博客：https://www.cnblogs.com/gitcatone/p/17796975.html二、注册ECS服务器并进行远程控制产品简介ECS弹性服务器，一款运行在云端的弹性虚拟计算机，能够实现弹性储存容量与......
微服务调用方式详解
在微服务架构中，需要调用很多服务才能完成一项功能。服务之间如何互相调用就变成微服务架构中的一个关键问题。服务调用有两种方式，一种是RPC（RemoteProcedureCall）方式，另一种是事件驱动（Event-driven）方式，也就是发消息方式。消息方式是松耦合方式，比紧耦合的RPC方式要优越，但RPC方式如......
.NET Core|--调用C++库|--docker环境下让web api应用程序调用C++类库
前言#前提安装docker环境~启动docker~#多说一句,为什么我要搞这个一个镜像,既包含gcc开发环境,又包含.NET开发环境我的api应用程序是基于.NET写的,但是我的这个api程序,又要调用c++的一些东西,特别是涉及一些画图之类的,所以就需要gcc的开发环境,最终搞了这么一......
软件测试/人工智能|Python函数与调用：解放编程力量的关键
简介Python作为一门强大而灵活的编程语言，其函数机制为我们提供了一个重要的工具，使得代码更为模块化、可重用。在本文中，我们将深入探讨Python中函数的各个方面，包括什么是函数、内置函数、函数的定义和函数的调用，以及通过示例展示函数在实际编程中的应用。什么是函数？在Python中，......
编译C++程序调用dll的方法
在拥有.cpp源文件的情况下，调用其它dll并生成exe的方法第一步：新建C++空项目。第二步：将源文件放到项目根目录路径下，并在项目的源文件下添加现有项，将源文件添加进项目。第三步：在项目根目录下创建include文件夹，将需要被调用的dll的.h头文件放入该文件夹。第四步：在项目根目......
java使用http工具类调用第三方接口
java使用http工具类调用第三方接口一、所需maven依赖：<dependency><groupId>com.alibaba</groupId><artifactId>fastjson</artifactId><version>1.2.75</version>......

从Hugging Face下载模型到本地并调用

webui

相关文章

赞助商

阅读排行