记录一次Windows11本地部署Qwen1.5-0.5B AWQ模型的经历

标签：name lib 0.5 Windows11 AWQ bitsandbytes CUDA import model

直接上代码，来自魔搭的模型通义千问1.5-0.5B-Chat-AWQ · 模型库 (modelscope.cn)

from modelscope import AutoModelForCausalLM, AutoTokenizer
device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
    "qwen/Qwen1.5-0.5B-Chat-AWQ",
    device_map="cuda"
)
tokenizer = AutoTokenizer.from_pretrained("qwen/Qwen1.5-0.5B-Chat-AWQ")

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

　　运行一下

结果发现报错了

bug

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin E:\Prj\ChatGLM3-6B-32K\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so
False
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
CUDA SETUP: Loading binary E:\Prj\ChatGLM3-6B-32K\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA SETUP: Problem: The main issue seems to be that the main CUDA library was not detected.
CUDA SETUP: Solution 1): Your paths are probably not up-to-date. You can update them via: sudo ldconfig.
CUDA SETUP: Solution 2): If you do not have sudo rights, you can do the following:
CUDA SETUP: Solution 2a): Find the cuda library via: find / -name libcuda.so 2>/dev/null
CUDA SETUP: Solution 2b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_2a
CUDA SETUP: Solution 2c): For a permanent solution add the export from 2b into your .bashrc file, located at ~/.bashrc


E:\Prj\ChatGLM3-6B-32K\venv\lib\site-packages\bitsandbytes\cuda_setup\main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/hf-mirror.com'), WindowsPath('https')}
  warn(msg)
E:\Prj\ChatGLM3-6B-32K\venv\lib\site-packages\bitsandbytes\cuda_setup\main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('module'), WindowsPath('/matplotlib_inline.backend_inline')}
  warn(msg)
E:\Prj\ChatGLM3-6B-32K\venv\lib\site-packages\bitsandbytes\cuda_setup\main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
  warn(msg)
E:\Prj\ChatGLM3-6B-32K\venv\lib\site-packages\bitsandbytes\cuda_setup\main.py:149: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
  warn(msg)
E:\Prj\ChatGLM3-6B-32K\venv\lib\site-packages\bitsandbytes\cuda_setup\main.py:149: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library...
  warn(msg)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
File E:\Prj\ChatGLM3-6B-32K\venv\lib\site-packages\transformers\utils\import_utils.py:1390, in _LazyModule._get_module(self, module_name)
   1389 try:
-> 1390     return importlib.import_module("." + module_name, self.__name__)
   1391 except Exception as e:

File ~\AppData\Local\Programs\Python\Python310\lib\importlib\__init__.py:126, in import_module(name, package)
    125         level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:1006, in _find_and_load_unlocked(name, import_)

File <frozen importlib._bootstrap>:688, in _load_unlocked(spec)

File <frozen importlib._bootstrap_external>:883, in exec_module(self, module)

File <frozen importlib._bootstrap>:241, in _call_with_frames_removed(f, *args, **kwds)

File E:\Prj\ChatGLM3-6B-32K\venv\lib\site-packages\transformers\integrations\bitsandbytes.py:11
     10 if is_bitsandbytes_available():
---> 11     import bitsandbytes as bnb
     12     import torch

File E:\Prj\ChatGLM3-6B-32K\venv\lib\site-packages\bitsandbytes\__init__.py:6
      1 # Copyright (c) Facebook, Inc. and its affiliates.
      2 #
      3 # This source code is licensed under the MIT license found in the
      4 # LICENSE file in the root directory of this source tree.
----> 6 from . import cuda_setup, utils, research
      7 from .autograd._functions import (
      8     MatmulLtState,
      9     bmm_cublas,
   (...)
     13     matmul_4bit
     14 )

File E:\Prj\ChatGLM3-6B-32K\venv\lib\site-packages\bitsandbytes\research\__init__.py:1
----> 1 from . import nn
      2 from .autograd._functions import (
      3     switchback_bnb,
      4     matmul_fp8_global,
      5     matmul_fp8_mixed,
      6 )

File E:\Prj\ChatGLM3-6B-32K\venv\lib\site-packages\bitsandbytes\research\nn\__init__.py:1
----> 1 from .modules import LinearFP8Mixed, LinearFP8Global

File E:\Prj\ChatGLM3-6B-32K\venv\lib\site-packages\bitsandbytes\research\nn\modules.py:8
      7 import bitsandbytes as bnb
----> 8 from bitsandbytes.optim import GlobalOptimManager
      9 from bitsandbytes.utils import OutlierTracer, find_outlier_dims

File E:\Prj\ChatGLM3-6B-32K\venv\lib\site-packages\bitsandbytes\optim\__init__.py:6
      1 # Copyright (c) Facebook, Inc. and its affiliates.
      2 #
      3 # This source code is licensed under the MIT license found in the
      4 # LICENSE file in the root directory of this source tree.
----> 6 from bitsandbytes.cextension import COMPILED_WITH_CUDA
      8 from .adagrad import Adagrad, Adagrad8bit, Adagrad32bit

File E:\Prj\ChatGLM3-6B-32K\venv\lib\site-packages\bitsandbytes\cextension.py:20
     19     CUDASetup.get_instance().print_log_stack()
---> 20     raise RuntimeError('''
     21     CUDA Setup failed despite GPU being available. Please run the following command to get more information:
     22
     23     python -m bitsandbytes
     24
     25     Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
     26     to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
     27     and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues''')
     28 lib.cadam32bit_grad_fp32 # runs on an error if the library could not be found -> COMPILED_WITH_CUDA=False

RuntimeError:
        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
Cell In[1], line 4
      1 from modelscope import AutoModelForCausalLM, AutoTokenizer
      2 device = "cuda" # the device to load the model onto
----> 4 model = AutoModelForCausalLM.from_pretrained(
      5     "qwen/Qwen1.5-0.5B-Chat-AWQ",
      6     device_map="auto"
      7 )
      8 tokenizer = AutoTokenizer.from_pretrained("qwen/Qwen1.5-0.5B-Chat-AWQ")
     10 prompt = "Give me a short introduction to large language model."

File E:\Prj\ChatGLM3-6B-32K\venv\lib\site-packages\modelscope\utils\hf_util.py:111, in get_wrapped_class.<locals>.ClassWrapper.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    108 else:
    109     model_dir = pretrained_model_name_or_path
--> 111 module_obj = module_class.from_pretrained(model_dir, *model_args,
    112                                           **kwargs)
    114 if module_class.__name__.startswith('AutoModel'):
    115     module_obj.model_dir = model_dir

File E:\Prj\ChatGLM3-6B-32K\venv\lib\site-packages\transformers\models\auto\auto_factory.py:561, in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
    559 elif type(config) in cls._model_mapping.keys():
    560     model_class = _get_model_class(config, cls._model_mapping)
--> 561     return model_class.from_pretrained(
    562         pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
    563     )
    564 raise ValueError(
    565     f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
    566     f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
    567 )

File E:\Prj\ChatGLM3-6B-32K\venv\lib\site-packages\modelscope\utils\hf_util.py:74, in patch_model_base.<locals>.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
     72 else:
     73     model_dir = pretrained_model_name_or_path
---> 74 return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)

File E:\Prj\ChatGLM3-6B-32K\venv\lib\site-packages\transformers\modeling_utils.py:3389, in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
   3386     keep_in_fp32_modules = []
   3388 if hf_quantizer is not None:
-> 3389     hf_quantizer.preprocess_model(
   3390         model=model, device_map=device_map, keep_in_fp32_modules=keep_in_fp32_modules
   3391     )
   3393     # We store the original dtype for quantized models as we cannot easily retrieve it
   3394     # once the weights have been quantized
   3395     # Note that once you have loaded a quantized model, you can't change its dtype so this will
   3396     # remain a single source of truth
   3397     config._pre_quantization_dtype = torch_dtype

File E:\Prj\ChatGLM3-6B-32K\venv\lib\site-packages\transformers\quantizers\base.py:166, in HfQuantizer.preprocess_model(self, model, **kwargs)
    164 model.is_quantized = True
    165 model.quantization_method = self.quantization_config.quant_method
--> 166 return self._process_model_before_weight_loading(model, **kwargs)

File E:\Prj\ChatGLM3-6B-32K\venv\lib\site-packages\transformers\quantizers\quantizer_awq.py:77, in AwqQuantizer._process_model_before_weight_loading(self, model, **kwargs)
     76 def _process_model_before_weight_loading(self, model: "PreTrainedModel", **kwargs):
---> 77     from ..integrations import get_keys_to_not_convert, replace_with_awq_linear
     79     self.modules_to_not_convert = get_keys_to_not_convert(model)
     81     if self.quantization_config.modules_to_not_convert is not None:

File <frozen importlib._bootstrap>:1075, in _handle_fromlist(module, fromlist, import_, recursive)

File E:\Prj\ChatGLM3-6B-32K\venv\lib\site-packages\transformers\utils\import_utils.py:1380, in _LazyModule.__getattr__(self, name)
   1378     value = self._get_module(name)
   1379 elif name in self._class_to_module.keys():
-> 1380     module = self._get_module(self._class_to_module[name])
   1381     value = getattr(module, name)
   1382 else:

File E:\Prj\ChatGLM3-6B-32K\venv\lib\site-packages\transformers\utils\import_utils.py:1392, in _LazyModule._get_module(self, module_name)
   1390     return importlib.import_module("." + module_name, self.__name__)
   1391 except Exception as e:
-> 1392     raise RuntimeError(
   1393         f"Failed to import {self.__name__}.{module_name} because of the following error (look up to see its"
   1394         f" traceback):\n{e}"
   1395     ) from e

RuntimeError: Failed to import transformers.integrations.bitsandbytes because of the following error (look up to see its traceback):

        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

大量的报错 Log

首先检查环境：

命令行运行查看显卡和cuda配置

C:\Users\20116>nvidia-smi
Thu Apr  4 17:27:53 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 537.58                 Driver Version: 537.58       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3050 ...  WDDM  | 00000000:01:00.0 Off |                  N/A |
| N/A   56C    P8               3W /  69W |   1674MiB /  4096MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A    131684      C   ...rograms\Python\Python310\python.exe    N/A      |
+---------------------------------------------------------------------------------------+

C:\Users\20116>nvcc
nvcc fatal   : No input files specified; use option --help for more information

C:\Users\20116>

Shell Code

运行代码查看部分配置库的信息

import torch
import modelscope
print(torch.__version__)
print(torch.cuda.is_available())
print(modelscope.__version__)

输出：

2.1.0+cu118
True
1.12.0

查看报错信息库下的文件内容，发现有cu116相关文件

重试了以下的解决方法：

大模型训练时，使用bitsandbytes报错的解决方法-CSDN博客

Linux下安装CUDA并配置环境变量 – 源码巴士 (code84.com)

bitsandbytes-cuda118 on pypi.org? · Issue #866 · TimDettmers/bitsandbytes (github.com)

解决过程，大部分的解决方法都不行，发现这个bitsandbytes库的环境要求比较高，而且还需要对应的cuda版本，可惜的是我cu118 的版本没有对应的bitsandbytes库

离谱的是我才发现本机最开始安装的是cu117的版本，结果torch安装的却是cu118的版本虽然查看命令行smi发现还能支持cu121 的版本，之前跑了这么多的项目都能动就离谱了，仔细查看了报错的信息：

Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues 或许是我的cuda的环境变量还没有配置完全，就配置了LD_LIBRARY_PATH，指向了cuda文件夹下了lib文件夹 然后在笔记本里重启内核再跑了一遍代码，结果居然成了，可喜可贺最后的输出：

标签：name,lib,0.5,Windows11,AWQ,bitsandbytes,CUDA,import,model
From： https://www.cnblogs.com/SaberZHT/p/18114426

记录一次Windows11本地部署Qwen1.5-0.5B AWQ模型的经历

相关文章

赞助商

阅读排行