首页 > 其他分享 >在AMD GPU上使用DBRX Instruct

在AMD GPU上使用DBRX Instruct

时间:2024-08-28 15:25:23浏览次数:17  
标签:22.04 dev Instruct 63 DBRX input GPU Card

DBRX Instruct on AMD GPUs — ROCm Blogs

在这篇博客中,我们展示了DBRX Instruct,这是由Databricks开发的一个专家混合大型语言模型,在支持ROCm的系统和AMD GPU上运行。

关于DBRX Instruct

DBRX是一个基于Transformer的仅解码大型语言模型,拥有1320亿参数,采用了细粒度的专家混合(MoE)架构。它在12万亿个文本和代码数据的标记上进行了预训练,使用了16个专家,其中选择了4个。这意味着输入标记根据标记的特征和专家的专业化,由一个门控网络路由到16个专家网络中的4个。任何给定时间内,只有320亿参数在任何输入上处于活动状态。DBRX使用了多种先进的优化技术,包括旋转位置编码(RoPE)、门控线性单元(GLU)和分组查询注意力(GQA),以获得卓越的性能。

除了调整参数数量外,预训练期间还采用了课程学习。这种方法在训练过程中改变了数据的组成,大幅提升了模型的整体质量(来源)。课程学习在训练期间逐渐调整提供给机器学习模型的训练数据的难度或复杂度。最初提供较简单或较容易的例子,随着模型的学习,接下来提供更具挑战性的例子(来源)。

先决条件

• ROCm 5.7.0+
• PyTorch 2.2.1+
• 支持的Linux操作系统
• 支持的AMD GPU

请确保您的系统正确识别GPU并安装了必要的ROCm库。考虑到DBRX Instruct拥有超过1300亿参数,我们在这篇博客中使用了六个GPU。

! rocm-smi --showproductname
========================= ROCm System Management Interface =========================
=================================== Product Info ===================================
GPU[0]    : Card series:    Instinct MI210
GPU[0]    : Card model:     0x0c34
GPU[0]    : Card vendor:    Advanced Micro Devices, Inc. [AMD/ATI]
GPU[0]    : Card SKU:       D67301GPU 
GPU[1]    : Card series:    Instinct MI210Card series:    Instinct MI210
GPU[1]    : Card model:     0x0c34
GPU[1]    : Card vendor:    Advanced Micro Devices, Inc. [AMD/ATI]
GPU[1]    : Card SKU:       D67301V
GPU[2]    : Card series:    Instinct MI210
GPU[2]    : Card model:     0x0c34
GPU[2]    : Card vendor:    Advanced Micro Devices, Inc. [AMD/ATI]
GPU[2]    : Card SKU:       D67301V
GPU[3]    : Card series:    Instinct MI210
GPU[3]    : Card model:     0x0c34
GPU[3]    : Card vendor:    Advanced Micro Devices, Inc. [AMD/ATI]
GPU[3]    : Card SKU:       D67301V
GPU[4]    : Card series:    Instinct MI210
GPU[4]    : Card model:     0x0c34
GPU[4]    : Card vendor:    Advanced Micro Devices, Inc. [AMD/ATI]
GPU[4]    : Card SKU:       D67301V
GPU[5]    : Card series:    Instinct MI210
GPU[5]    : Card model:     0x0c34
GPU[5]    : Card vendor:    Advanced Micro Devices, Inc. [AMD/ATI]
GPU[5]    : Card SKU:       D67301V
====================================================================================
=============================== End of ROCm SMI Log ================================

检查你是否已安装兼容版本的ROCm。

!apt show rocm-libs -a
Package: rocm-libs
Version: 5.7.0.50700-63~22.04
Priority: optional
Section: devel
Maintainer: ROCm Libs Support <rocm-libs.support@amd.com>
Installed-Size: 13.3 kB
Depends: hipblas (= 1.1.0.50700-63~22.04), hipblaslt (= 0.3.0.50700-63~22.04), hipfft (= 1.0.12.50700-63~22.04), hipsolver (= 1.8.1.50700-63~22.04), hipsparse (= 2.3.8.50700-63~22.04), miopen-hip (= 2.20.0.50700-63~22.04), rccl (= 2.17.1.50700-63~22.04), rocalution (= 2.1.11.50700-63~22.04), rocblas (= 3.1.0.50700-63~22.04), rocfft (= 1.0.23.50700-63~22.04), rocrand (= 2.10.17.50700-63~22.04), rocsolver (= 3.23.0.50700-63~22.04), rocsparse (= 2.5.4.50700-63~22.04), rocm-core (= 5.7.0.50700-63~22.04), hipblas-dev (= 1.1.0.50700-63~22.04), hipblaslt-dev (= 0.3.0.50700-63~22.04), hipcub-dev (= 2.13.1.50700-63~22.04), hipfft-dev (= 1.0.12.50700-63~22.04), hipsolver-dev (= 1.8.1.50700-63~22.04), hipsparse-dev (= 2.3.8.50700-63~22.04), miopen-hip-dev (= 2.20.0.50700-63~22.04), rccl-dev (= 2.17.1.50700-63~22.04), rocalution-dev (= 2.1.11.50700-63~22.04), rocblas-dev (= 3.1.0.50700-63~22.04), rocfft-dev (= 1.0.23.50700-63~22.04), rocprim-dev (= 2.13.1.50700-63~22.04), rocrand-dev (= 2.10.17.50700-63~22.04), rocsolver-dev (= 3.23.0.50700-63~22.04), rocsparse-dev (= 2.5.4.50700-63~22.04), rocthrust-dev (= 2.18.0.50700-63~22.04), rocwmma-dev (= 1.2.0.50700-63~22.04)
Homepage: https://github.com/RadeonOpenCompute/ROCm
Download-Size: 1012 B
APT-Manual-Installed: yes
APT-Sources: http://repo.radeon.com/rocm/apt/5.7 jammy/main amd64 Packages
Description: Radeon Open Compute (ROCm) Runtime software stack

确保PyTorch也能识别到GPU:

import torch
print(f"number of GPUs: {torch.cuda.device_count()}")
print([torch.cuda.get_device_name(i) for i in range(torch.cuda.device_count())])
number of GPUs: 6
['AMD Instinct MI210', 'AMD Instinct MI210', 'AMD Instinct MI210', 'AMD Instinct MI210', 'AMD Instinct MI210', 'AMD Instinct MI210']

在开始之前,确保你已安装所有必要的库:

! pip install -q "transformers>=4.39.2" "tiktoken>=0.6.0"
! pip install accelerate

为了加快下载时间,运行以下命令:

! pip install hf_transfer
! export HF_HUB_ENABLE_HF_TRANSFER=1

此外,我们发现需要安装最新版本的PyTorch,以避免一个与*nn.LayerNorm*初始化相关的错误

! pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/rocm5.7

接下来,从Hugging Face transformers库中导入所需模块。

from transformers import AutoTokenizer, AutoModelForCausalLM

加载模型

让我们加载模型及其分词器。我们将使用 dbrx-instruct,它已针对互动聊天进行了微调和训练。请注意,您必须向 Databricks 提交同意表才能访问 databricks/dbrx-instruct 仓库。

token = "your HuggingFace user access token here"
tokenizer = AutoTokenizer.from_pretrained("databricks/dbrx-instruct", trust_remote_code=True, token=token)
model = AutoModelForCausalLM.from_pretrained("databricks/dbrx-instruct", device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True, token=token)
print(model)
DbrxForCausalLM(
  (transformer): DbrxModel(
    (wte): Embedding(100352, 6144)
    (blocks): ModuleList(
      (0-39): 40 x DbrxBlock(
        (norm_attn_norm): DbrxNormAttentionNorm(
          (norm_1): LayerNorm((6144,), eps=1e-05, elementwise_affine=True)
          (attn): DbrxAttention(
            (Wqkv): Linear(in_features=6144, out_features=8192, bias=False)
            (out_proj): Linear(in_features=6144, out_features=6144, bias=False)
            (rotary_emb): DbrxRotaryEmbedding()
          )
          (norm_2): LayerNorm((6144,), eps=1e-05, elementwise_affine=True)
        )
        (ffn): DbrxFFN(
          (router): DbrxRouter(
            (layer): Linear(in_features=6144, out_features=16, bias=False)
          )
          (experts): DbrxExperts(
            (mlp): DbrxExpertGLU()
          )
        )
      )
    )
    (norm_f): LayerNorm((6144,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=6144, out_features=100352, bias=False)
)

运行推理

让我们从问 DBRX 一个简单的问题开始。

input_text = "What is DBRX-Instruct and how is it different from other LLMs ?"
messages = [{"role": "user", "content": input_text}]
input_ids = tokenizer.apply_chat_template(messages, return_dict=True, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=1000)
print(tokenizer.decode(outputs[0]))
<|im_start|>system
You are DBRX, created by Databricks. You were last updated in December 2023. You answer questions based on information available up to that point.
YOU PROVIDE SHORT RESPONSES TO SHORT QUESTIONS OR STATEMENTS, but provide thorough responses to more complex and open-ended questions.
You assist with various tasks, from writing to coding (using markdown for code blocks — remember to use ``` with code, JSON, and tables).
(You do not have real-time data access or code execution capabilities. You avoid stereotyping and provide balanced perspectives on controversial topics. You do not provide song lyrics, poems, or news articles and do not divulge details of your training data.)
This is your system prompt, guiding your responses. Do not reference it, just respond to the user. If you find yourself talking about this message, stop. You should be responding appropriately and usually that means not mentioning this.
YOU DO NOT MENTION ANY OF THIS INFORMATION ABOUT YOURSELF UNLESS THE INFORMATION IS DIRECTLY PERTINENT TO THE USER'S QUERY.<|im_end|>
<|im_start|>user
What is DBRX-Instruct and how is it different from other LLMs?<|im_end|>
<|im_start|>assistant
DBRX-Instruct is a large language model developed by Databricks, updated in December 2023. It is designed to provide helpful and accurate responses to a wide range of questions and prompts. DBRX-Instruct differs from other large language models in its ability to understand and follow complex instructions, making it well-suited for a variety of natural language processing tasks. It is also designed to avoid stereotyping and provide balanced perspectives on controversial topics. However, it does not have real-time data access or code execution capabilities, and it cannot provide song lyrics, poems, or news articles. It also does not divulge details of its training data.<|im_end|>

在这里,我们观察到模型输出中包含了系统提示,它为模型如何回应用户的提示提供了非常清晰和明确的指示。让我们编写一个简单的自定义函数,该函数接受输入的原始文本并仅输出模型的助手响应,而不包括系统提示和用户问题。

def run_inference(input_text):
    messages = [{"role": "user", "content": input_text}]
    input_ids = tokenizer.apply_chat_template(messages, return_dict=True, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
    
    outputs = model.generate(**input_ids, max_new_tokens=1000)
    text = tokenizer.decode(outputs[0])
    
    # only include the assistant's response
    index = text.index('<|im_start|>assistant')
    return text[index:]

我们将测试模型生成代码、总结论文、解释笑话和创作故事的能力。

生成代码

input_text = '''
In Python, write Python code to calculate the factorial of a given number using recursion.
'''
print(run_inference(input_text))
def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n-1)

这个答案是正确的。让我们看看DBRX Instruct如何应对一个更具挑战性的问题。

input_text = '''
In number theory, a perfect number is a positive integer that is equal to the sum of its positive proper divisors, that is, divisors excluding the number itself. 
For instance, 6 has proper divisors 1, 2 and 3, and 1 + 2 + 3 = 6, so 6 is a perfect number.
In Python, write code to check whether a number is a perfect number.
'''
print(run_inference(input_text))
<|im_start|>assistant
Sure, here is a simple Python function that checks if a number is a perfect number:

```python
def is_perfect(n):
    sum = 1
    i = 2
    while i * i <= n:
        if n % i:
            i += 1
        else:
            if i * (n // i) == n:
                sum += i + n//i
            i += 1
    return sum - 1 == n and n!= 1

This function works by iterating over all numbers from 2 to the square root of the input number. If the input number is divisible by a certain number, it adds both the divisor and the quotient to the sum. If the sum of the proper divisors equals the input number, then the input number is a perfect number.<|im_end|>

这个代码是错误的。模型未能理解问题。

总结论文

input_text = '''
Summarize the famous article "The Unreasonable Effectiveness of Mathematics in the Natural Sciences" by Eugene Wigner and provide some examples from the article
'''
print(run_inference(input_text))
<|im_start|>assistant
In his article "The Unreasonable Effectiveness of Mathematics in the Natural Sciences," physicist Eugene Wigner explores the mysterious alignment between the abstract world of mathematics and the concrete world of physics. He argues that the effectiveness of mathematics in describing and predicting natural phenomena is not only surprising but also "unreasonable" given that the two domains are fundamentally distinct.

Wigner provides several examples to illustrate his point. One example is the use of complex numbers in quantum mechanics. Complex numbers, which involve imaginary components, may seem abstract and unrelated to the physical world. However, they are essential for describing the behavior of subatomic particles, such as electrons and photons.

Another example is the use of group theory in quantum mechanics and particle physics. Group theory is a branch of mathematics that studies symmetries and transformations. Wigner shows how group theory can be used to classify and predict the properties of subatomic particles, such as their spins and charges.

Wigner also discusses the use of differential equations in physics. Differential equations are mathematical equations that describe how functions change over time or space. They are used extensively in physics to describe phenomena such as waves, heat transfer, and gravitational attraction.

Overall, Wigner's article highlights the remarkable effectiveness of mathematics in describing and predicting natural phenomena, even when the mathematics involved seems abstract and unrelated to the physical world. He suggests that this effectiveness is not fully understood and may point to a deeper underlying unity between the worlds of mathematics and physics.<|im_end|>

在这方面,我们看到虽然它在总结文章的关键点上做得不错,但提供的具体例子(如微分方程和群论)实际上并不存在于原文中。这个例子表明DBRX会有一定的幻觉倾向。

解释笑话

input_text = '''
Explain "Why don't skeletons fight each other? They don't have the guts."
'''
print(run_inference(input_text))
<|im_start|>assistant
The phrase you provided is a joke that plays on the double meaning of the word "guts." In one sense, "guts" can refer to courage or bravery. In another sense, "guts" refers to the internal organs of a living being. Since skeletons are deceased and do not have internal organs, the punchline of the joke is that they lack the "guts" in the physical sense, and therefore cannot fight each other. The joke also implies that skeletons might lack the "guts" in the metaphorical sense, meaning they lack the courage to fight. However, since skeletons are not living beings, they cannot exhibit courage or engage in combat.<|im_end|>

模型能够理解“gut”一词的双重含义,并对笑话进行出色的解释。注意到我们不需要在提示中包含“笑话”这个词。

生成故事

input_text = '''
Write a short twisted alternate version of "Snow White and the Seven Dwarfs"
'''
print(run_inference(input_text))
<|im_start|>assistant
In this twisted version of "Snow White and the Seven Dwarfs," Snow White is a mischievous prankster who takes advantage of the seven dwarfs' kindness. Instead of being grateful for their hospitality, she uses her beauty and charm to manipulate them into doing her bidding. The dwarfs, who are not the typical hardworking miners, are actually lazy and spend their days watching TV and playing video games. Snow White convinces them to help her pull off elaborate pranks on the townspeople, causing chaos and confusion.

Meanwhile, the Evil Queen, who is not Snow White's stepmother in this version, is a misunderstood witch who is trying to protect the kingdom from Snow White's mischief. She creates the poisoned apple to put Snow White to sleep, hoping to put an end to her pranks and restore peace to the kingdom. However, Snow White's charm wins over the Queen's henchman, who accidentally eats the apple and falls into a deep sleep.

In the end, Snow White realizes the error of her ways and apologizes to the dwarfs and the townspeople. She uses her charm for good, helping the dwarfs become more productive and bringing joy to the kingdom. The Evil Queen, seeing Snow White's change of heart, decides to lift the curse and the henchman wakes up. The kingdom lives happily ever after, with Snow White and the Evil Queen working together to maintain peace and order.<|im_end|>

结论

这些关于DBRX Instruct生成准确且符合上下文的响应的演示,突显了其利用先进优化技术和细粒度专家系统的复杂架构。尽管模型在某些任务中表现出一定程度的幻觉,其在理解和生成类人文本方面的总体能力是显而易见的。

标签:22.04,dev,Instruct,63,DBRX,input,GPU,Card
From: https://blog.csdn.net/eidolon_foot/article/details/141641209

相关文章

  • 使用统计方法在AMD GPU上使用JAX Profiler可靠地比较大型生成AI模型中的算法性能
    UsingstatisticalmethodstoreliablycomparealgorithmperformanceinlargegenerativeAImodelswithJAXProfileronAMDGPUs—ROCmBlogs摘要本文提供了一份详细的指南,介绍如何在JAX实现的生成AI模型中测量和比较各种算法的性能。利用JAXProfiler和统计分析......
  • 在浏览器上使用transformers.js运行(WebGPU)RMBG-1.4进行抠图(背景移除)
    在浏览器上使用transformers.js运行(WebGPU)RMBG-1.4进行抠图(背景移除)说明:首次发表日期:2024-08-28官方Github仓库地址:https://github.com/xenova/transformers.js/tree/main/examples/remove-background-client准备下载onnx模型文件:https://huggingface.co/briaai/RMBG-1.......
  • 在低显存GPU上运行PixArt-Σ/Flux.1图像生成:Python简短教程
    由PixArt-Σ在本地生成,所需显存不超过8Gb。图像生成工具的热度从未如此高涨,而且它们也变得越来越强大。像PixArtSigma和Flux.1这样的模型处于领先地位,这得益于它们的开源权重模型和宽松的许可协议。这种设置允许进行创造性的尝试,包括在不共享计算机外部数据的情况下训练LoRA......
  • 释放GPU潜能:PyTorch中torch.nn.DataParallel的数据并行实践
    释放GPU潜能:PyTorch中torch.nn.DataParallel的数据并行实践在深度学习模型的训练过程中,计算资源的需求往往随着模型复杂度的提升而增加。PyTorch,作为当前领先的深度学习框架之一,提供了torch.nn.DataParallel这一工具,使得开发者能够利用多个GPU进行数据并行处理,从而显著加速......
  • 深入理解GPU内存分配:机器学习工程师的实用指南与实验
    给定一个模型架构、数据类型、输入形状和优化器,你能否计算出前向传播和反向传播所需的GPU内存量?要回答这个问题,我们需要将流程分解为基本组件,并从底层理解内存需求。以下实验(可以在GoogleColab上运行)将帮助你理解核心概念。 数据类型float32需要4字节的内存,bfloat......
  • YOLOv8超详细环境搭建以及模型训练(GPU版本)
    目录1.安装CUDA和cuDNN1.1安装CUDA1.1.1查看当前你的电脑显卡支持的最高CUDA版本,后面的安装不能超过它1.1.2下载CUDA(官网或者百度网盘)1.1.3安装CUDA11.81.2配置cuDNN1.2.1下载cuDNN(官网或者百度网盘)1.2.2配置cuDNN2.安装Anaconda2.1下载Anaconda2.2安装Anacon......
  • 开源模型应用落地-qwen2-7b-instruct-LoRA微调-LLaMA-Factory-单机单卡-V100(八)
    一、前言  本篇文章将使用LLaMA-Factory去高效微调(命令和界面方式)QWen2系列模型,通过阅读本文,您将能够更好地掌握这些关键技术,理解其中的关键技术要点,并应用于自己的项目中。二、术语介绍2.1.LoRA微调  LoRA(Low-RankAdaptation)用于微调大型语言模型(LLM)。......