InternLM2 Demo初体验-书生浦语大模型实战营学习笔记2

标签：初体验 env Demo InternLM2 conda install HOME 033 share

本文包括第二期实战营的第2课内容。本来是想给官方教程做做补充的，没想到官方教程的质量还是相当高的，跟着一步一步做基本上没啥坑。所以这篇笔记主要是拆解一下InternStudio封装的一些东西，防止在本地复现时出现各种问题。

搭建环境

首先是搭建环境这里，官方教程说：

进入开发机后，在 `terminal` 中输入环境配置命令 (配置环境时间较长，需耐心等待)：

studio-conda -o internlm-base -t demo
# 与 studio-conda 等效的配置方案
# conda create -n demo python==3.10 -y
# conda activate demo
# conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia

`studio-conda` 探秘

那么，这句studio-conda -o internlm-base -t demo究竟是什么呢？我们直接查看一下/root/.bashrc，发现里面就一句：

source /share/.aide/config/bashrc

继续查看/share/.aide/config/bashrc，这个可长了，这里给出最后两句：

export HF_ENDPOINT='https://hf-mirror.com'
alias studio-conda="/share/install_conda_env.sh"
alias studio-smi="/share/studio-smi"

点击查看/share/.aide/config/bashrc的全部代码

#! /bin/bash

# ~/.bashrc: executed by bash(1) for non-login shells.
# see /usr/share/doc/bash/examples/startup-files (in the package bash-doc)
# for examples

# If not running interactively, don't do anything
case $- in
    *i*) ;;
      *) return;;
esac

# don't put duplicate lines or lines starting with space in the history.
# See bash(1) for more options
HISTCONTROL=ignoreboth

# append to the history file, don't overwrite it
shopt -s histappend

# for setting history length see HISTSIZE and HISTFILESIZE in bash(1)
HISTSIZE=1000
HISTFILESIZE=2000

# check the window size after each command and, if necessary,
# update the values of LINES and COLUMNS.
shopt -s checkwinsize

# If set, the pattern "**" used in a pathname expansion context will
# match all files and zero or more directories and subdirectories.
#shopt -s globstar

# make less more friendly for non-text input files, see lesspipe(1)
[ -x /usr/bin/lesspipe ] && eval "$(SHELL=/bin/sh lesspipe)"

# set variable identifying the chroot you work in (used in the prompt below)
if [ -z "${debian_chroot:-}" ] && [ -r /etc/debian_chroot ]; then
    debian_chroot=$(cat /etc/debian_chroot)
fi

# set a fancy prompt (non-color, unless we know we "want" color)
case "$TERM" in
    xterm-color|*-256color) color_prompt=yes;;
esac

# uncomment for a colored prompt, if the terminal has the capability; turned
# off by default to not distract the user: the focus in a terminal window
# should be on the output of commands, not on the prompt
#force_color_prompt=yes

if [ -n "$force_color_prompt" ]; then
    if [ -x /usr/bin/tput ] && tput setaf 1 >&/dev/null; then
	# We have color support; assume it's compliant with Ecma-48
	# (ISO/IEC-6429). (Lack of such support is extremely rare, and such
	# a case would tend to support setf rather than setaf.)
	color_prompt=yes
    else
	color_prompt=
    fi
fi

if [ "$color_prompt" = yes ]; then
    PS1='${debian_chroot:+($debian_chroot)}\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$ '
else
    PS1='${debian_chroot:+($debian_chroot)}\u@\h:\w\$ '
fi
unset color_prompt force_color_prompt

# If this is an xterm set the title to user@host:dir
case "$TERM" in
xterm*|rxvt*)
    PS1="\[\e]0;${debian_chroot:+($debian_chroot)}\u@\h: \w\a\]$PS1"
    ;;
*)
    ;;
esac

# enable color support of ls and also add handy aliases
if [ -x /usr/bin/dircolors ]; then
    test -r ~/.dircolors && eval "$(dircolors -b ~/.dircolors)" || eval "$(dircolors -b)"
    alias ls='ls --color=auto'
    #alias dir='dir --color=auto'
    #alias vdir='vdir --color=auto'

    alias grep='grep --color=auto'
    alias fgrep='fgrep --color=auto'
    alias egrep='egrep --color=auto'
fi

# colored GCC warnings and errors
#export GCC_COLORS='error=01;31:warning=01;35:note=01;36:caret=01;32:locus=01:quote=01'

# some more ls aliases
alias ll='ls -alF'
alias la='ls -A'
alias l='ls -CF'

# Add an "alert" alias for long running commands.  Use like so:
#   sleep 10; alert
alias alert='notify-send --urgency=low -i "$([ $? = 0 ] && echo terminal || echo error)" "$(history|tail -n1|sed -e '\''s/^\s*[0-9]\+\s*//;s/[;&|]\s*alert$//'\'')"'

# Alias definitions.
# You may want to put all your additions into a separate file like
# ~/.bash_aliases, instead of adding them here directly.
# See /usr/share/doc/bash-doc/examples in the bash-doc package.

if [ -f ~/.bash_aliases ]; then
    . ~/.bash_aliases
fi

# enable programmable completion features (you don't need to enable
# this, if it's already enabled in /etc/bash.bashrc and /etc/profile
# sources /etc/bash.bashrc).
if ! shopt -oq posix; then
  if [ -f /usr/share/bash-completion/bash_completion ]; then
    . /usr/share/bash-completion/bash_completion
  elif [ -f /etc/bash_completion ]; then
    . /etc/bash_completion
  fi
fi

# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/root/.conda/condabin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/root/.conda/etc/profile.d/conda.sh" ]; then
        . "/root/.conda/etc/profile.d/conda.sh"
    else
        export PATH="/root/.conda/condabin:$PATH"
    fi
fi
unset __conda_setup
# <<< conda initialize <<<

if [ -d "/root/.conda/envs/xtuner" ]; then
  CONDA_ENV=xtuner
else
  CONDA_ENV=base
fi

source activate $CONDA_ENV

cat /share/.aide/config/welcome_vgpu

#if [ $CONDA_ENV != "xtuner" ]; then
#  echo -e """
#  \033[31m 检测到您尚未初始化xtuner环境, 建议执行> source init_xtuner_env.sh \033[0m
#  """
#fi
export https_proxy=http://proxy.intern-ai.org.cn:50000
export http_proxy=http://proxy.intern-ai.org.cn:50000
export no_proxy='localhost,127.0.0.1,0.0.0.0,172.18.47.140'
export PATH=/root/.local/bin:$PATH
export HF_ENDPOINT='https://hf-mirror.com'
alias studio-conda="/share/install_conda_env.sh"
alias studio-smi="/share/studio-smi"

注意到倒数第二行：alias studio-conda="/share/install_conda_env.sh"，也就是说studio-conda是/share/install_conda_env.sh的别名。我们在执行studio-conda -o internlm-base -t demo的时候，实际上调用的是/share/install_conda_env.sh这个脚本。我们进一步查看/share/install_conda_env.sh：

HOME_DIR=/root
CONDA_HOME=$HOME_DIR/.conda
SHARE_CONDA_HOME=/share/conda_envs
SHARE_HOME=/share

    echo -e "\033[34m [1/2] 开始安装conda环境: <$target>. \033[0m"
    sleep 3
    tar --skip-old-files -xzvf /share/pkgs.tar.gz -C ${CONDA_HOME}
    wait_echo&
    wait_pid=$!
    conda create -n $target --clone ${SHARE_CONDA_HOME}/${source}
    if [ $? -ne 0 ]; then
        echo -e "\033[31m 初始化conda环境: ${target}失败 \033[0m"
        exit 10
    fi

    kill $wait_pid

    # for xtuner, re-install dependencies
    case "$source" in
    xtuner)
        source_install_xtuner $target
        ;;
    esac

    echo -e "\033[34m [2/2] 同步当前conda环境至jupyterlab kernel \033[0m"
    lab add $target
    source $CONDA_HOME/bin/activate $target
    cd $HOME_DIR

点击查看/share/install_conda_env.sh的全部代码

#!/bin/bash
# clone internlm-base conda env to user's conda env
# created by xj on 01.07.2024
# modifed by xj on 01.19.2024 to fix bug of conda env clone
# modified by ljy on 01.26.2024 to extend

XTUNER_UPDATE_DATE=`cat /share/repos/UPDATE | grep xtuner |awk -F= '{print $2}'`
HOME_DIR=/root
CONDA_HOME=$HOME_DIR/.conda
SHARE_CONDA_HOME=/share/conda_envs
SHARE_HOME=/share

list() {
    cat <<-EOF
  预设环境          描述

  internlm-base    pytorch:2.0.1, pytorch-cuda:11.7
  xtuner           Xtuner(源码安装: main $(echo -e "\033[4mhttps://github.com/InternLM/xtuner/tree/main\033[0m"), 更新日期：$XTUNER_UPDATE_DATE)
  pytorch-2.1.2    pytorch:2.1.2, pytorch-cuda:11.8
EOF
}


help() {
    cat <<-EOF
  说明: 用于快速clone预设的conda环境

  使用: 
    1. studio-conda env -l/list 打印预设的conda环境列表
    2. studio-conda <target-conda-name> 快速clone: 默认拷贝internlm-base conda环境
    3. studio-conda -t <target-conda-name> -o <origin-conda-name> 将预设的conda环境拷贝到指定的conda环境
        
EOF
}

clone() {
    source=$1
    target=$2

    if [[ -z "$source" || -z "$target" ]]; then
        echo -e "\033[31m 输入不符合规范 \033[0m"
        help
        exit 1
    fi

    if [ ! -d "${SHARE_CONDA_HOME}/$source" ]; then
        echo -e "\033[34m 指定的预设环境: $source不存在\033[0m"
        list
        exit 1
    fi

    if [ -d "${CONDA_HOME}/envs/$target" ]; then
        echo -e "\033[34m 指定conda环境的目录: ${CONDA_HOME}/envs/$target已存在, 将清空原目录安装 \033[0m"
        wait_echo&
        wait_pid=$!
        rm -rf "${CONDA_HOME}/envs/$target"
        kill $wait_pid
    fi

    echo -e "\033[34m [1/2] 开始安装conda环境: <$target>. \033[0m"
    sleep 3
    tar --skip-old-files -xzvf /share/pkgs.tar.gz -C ${CONDA_HOME}
    wait_echo&
    wait_pid=$!
    conda create -n $target --clone ${SHARE_CONDA_HOME}/${source}
    if [ $? -ne 0 ]; then
        echo -e "\033[31m 初始化conda环境: ${target}失败 \033[0m"
        exit 10
    fi

    kill $wait_pid

    # for xtuner, re-install dependencies
    case "$source" in
    xtuner)
        source_install_xtuner $target
        ;;
    esac

    echo -e "\033[34m [2/2] 同步当前conda环境至jupyterlab kernel \033[0m"
    lab add $target
    source $CONDA_HOME/bin/activate $target
    cd $HOME_DIR

    echo -e "\033[32m conda环境: $target安装成功! \033[0m"

    echo """
    ============================================
                    ALL DONE!
    ============================================
    """
}

source_install_xtuner() {
    conda_env=$1
    echo -e "\033[34m 源码安装xtuner... \033[0m"
    sleep 2
    pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

    install=0
    if [ -d "${HOME_DIR}/xtuner" ]; then
        read -r -p "$HOME_DIR中已存在目录xtuner: 是否清空目录? [Y/N][yes/no]" input
	case $input in
          [yY][eE][sS]|[yY])
            echo -e "\033[34m 清空目录: $HOME_DIR/xtuner, 并同步源码至该目录进行源码安装... \033[0m"
	    install=1
	    ;;
	  *)
	    echo -e "\033[34m 尝试使用: $HOME_DIR/xtuner目录进行源码安装... \033[0m" 
	    ;;
        esac
    else
        install=1
    fi
    
    if [ $install -eq 1 ]; then
        rm -rf $HOME_DIR/xtuner
	mkdir -p $HOME_DIR/xtuner
	cp -rf $SHARE_HOME/repos/xtuner/* $HOME_DIR/xtuner/
    fi

    cd $HOME_DIR/xtuner

    $CONDA_HOME/envs/$conda_env/bin/pip install -e '.[all]'
    if [ $? -ne 0 ]; then
        echo -e "\033[31m 源码安装xtuner失败 \033[0m"
	exit 10
    fi
    $CONDA_HOME/envs/$conda_env/bin/pip install cchardet
    $CONDA_HOME/envs/$conda_env/bin/pip install -U datasets
}


wait_echo() {
    local i=0
    local sp='/-\|'
    local n=${#sp}
    printf ' '
    while sleep 0.1; do
        printf '\b%s' "${sp:i++%n:1}"
    done
}

dispatch() {

    if [ $# -lt 1 ]; then
        help
        exit -2
    fi

    if [ $1 == "env" ]; then
        list
        exit 0
    fi

    if [[ $1 == "-h" || $1 == "help" ]]; then
        help
        exit 0
    fi

    origin_env=
    target_env=
    if [ $# -eq 1 ]; then
        origin_env=internlm-base
        target_env=$1
    else
        while getopts t:o: flag; do
            case "${flag}" in
            t) target_env=${OPTARG} ;;
            o) origin_env=${OPTARG} ;;
            esac
        done
    fi

    echo -e "\033[32m 预设环境: $origin_env \033[0m"
    echo -e "\033[32m 目标conda环境名称: $target_env \033[0m"
    sleep 3
    clone $origin_env $target_env
}

dispatch $@

这个文件就是它设置代码环境的了。脚本里面定义了几个变量和函数，之后就直接调用dispatch函数了。之后的流程如下：

因为我们给的参数是-o internlm-base -t demo，所以会直接从dispatch这里执行脚本中的clone函数，参数是 internlm-base demo。
CONDA_HOME会通过HOME_DIR=/root; CONDA_HOME=$HOME_DIR/.conda指定为/root/.conda，即工作区下的文件夹。
然后，将/share/pkgs.tar.gz解压至目录，再通过conda create clone的方式克隆环境完成环境的搭建。

所以这个命令实际上是将预配置好的环境打包解压克隆了一遍，和教程中的等效代码还是有较大不同的。

然后需要我们执行以下代码配置环境。轻轻吐槽一下既然都是直接解压并conda clone了，为什么不直接做一个装好这些库的conda环境压缩包。

conda activate demo
pip install huggingface-hub==0.17.3
pip install transformers==4.34 
pip install psutil==5.9.8
pip install accelerate==0.24.1
pip install streamlit==1.32.2 
pip install matplotlib==3.8.3 
pip install modelscope==1.9.5
pip install sentencepiece==0.1.99

下载模型

再通过调用modelscope.hub.snapshot_download从modelscope下载模型：

import os
from modelscope.hub.snapshot_download import snapshot_download

os.system("mkdir /root/models")
save_dir="/root/models"

snapshot_download("Shanghai_AI_Laboratory/internlm2-chat-1_8b", 
                  cache_dir=save_dir, revision='v1.1.0')

有一说一，官方教程新建文件夹这里不调用os.mkdir而是直接os.system("mkdir /root/models")真是个bad practice，别学。

模型推理

使用以下代码完成模型推理：

# 导入相关的库
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_name_or_path = "/root/models/Shanghai_AI_Laboratory/internlm2-chat-1_8b"

# Hugging Face 的 AutoTokenizer 和 AutoModelForCausalLM 类熟悉大模型的不会陌生，用于自动加载预训练模型和相应的tokenizer。
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=True, device_map='cuda:0')
# 相信远端代码以便从HuggingFace拉取确实模型权重，使用bf16量化节省内存，指定使用第一张显卡
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map='cuda:0')
model = model.eval()

system_prompt = """You are an AI assistant whose name is InternLM (书生·浦语).
- InternLM (书生·浦语) is a conversational language model that is developed by Shanghai AI Laboratory (上海人工智能实验室). It is designed to be helpful, honest, and harmless.
- InternLM (书生·浦语) can understand and communicate fluently in the language chosen by the user such as English and 中文.
"""

messages = [(system_prompt, '')]

print("=============Welcome to InternLM chatbot, type 'exit' to exit.=============")

while True:
    input_text = input("\nUser  >>> ")
    input_text = input_text.replace(' ', '')  # 移除用户输入文本中的空格
    if input_text == "exit":  # 如果要退出，输入exit即可
        break

    length = 0
	# 对模型的 stream_chat 方法进行迭代，该方法会生成一个对话的生成器。迭代过程中，每次生成一个回复消息 response 和一个占位符 _。
    for response, _ in model.stream_chat(tokenizer, input_text, messages):
		# 如果回复消息不为空，则打印回复消息中从上次打印位置 length 开始到结尾的部分，并刷新输出缓冲区。
        if response is not None:
            print(response[length:], flush=True, end="")
			# 更新上次打印的位置，以便下一次打印时从正确位置开始。
            length = len(response)

基础作业运行结果

输入命令，执行 Demo 程序：

conda activate demo
python /root/demo/cli_demo.py

基础作业

基础作业还是轻轻又松松啊哈哈哈哈。。。不过其实之前模型输出崩坏过一次：

崩坏的模型输出

对的，模型直接给了30个故事的名字。我直接掐断了模型的输出。

服务器显卡信息

出于好奇看了看显卡信息：

显卡信息

原来真的是A100啊，不过很好奇他们是怎么控制单个开发机的显存开销为10%、30%、50%的了。哈哈哈哈哈哈哈哈。

标签：初体验,env,Demo,InternLM2,conda,install,HOME,033,share
From： https://www.cnblogs.com/xiangcaoacao/p/18106116

InternLM2 Demo初体验-书生浦语大模型实战营学习笔记2

搭建环境

`studio-conda` 探秘

下载模型

模型推理

基础作业运行结果

服务器显卡信息

相关文章

赞助商

阅读排行

InternLM2 Demo初体验-书生浦语大模型实战营学习笔记2

搭建环境

studio-conda 探秘

下载模型

模型推理

基础作业运行结果

服务器显卡信息

相关文章

赞助商

阅读排行

`studio-conda` 探秘