Huggingface Transformers实现张量并行的小坑 set/get_output_embeddings

时间：2024-05-06 11:23:49浏览次数：28

标签：head set embeddings get self lm Transformers output

transformers 库里实现的很多模型会有这么两个函数 get_output_embeddings和 get_output_embeddings。以 SwitchTransformer 为例

class SwitchTransformersForConditionalGeneration(SwitchTransformersPreTrainedModel):
    def set_output_embeddings(self, new_embeddings):
        self.lm_head = new_embeddings
    def get_output_embeddings(self):
        return self.lm_head

默认情况下，大模型的输入和输出的 vocab 是保持一致的，所以如果传入的 embedding 的大小变化了，默认也会让 lm_head 发生变化。

但是在实现张量并行的时候，我们通常会使用如下方式来初始化lm_head

from fairscale.nn.model_parallel.layers import (
    ParallelEmbedding,
    RowParallelLinear,
    ColumnParallelLinear
)
default_linear_init = functools.partial(nn.init.kaiming_uniform_, a=math.sqrt(5))
def __init__(self, ...):
    self.lm_head = ColumnParallelLinear(config.d_model, config.vocab_size, bias=False, init_method=default_linear_init)

换言之，在多 GPU 张量并行下，每张卡上 lm_head 的输出维度就不再是原来的 vocab_size 了，而是 vocab_size/#gpus。所以一种粗暴的解决办法就是把get_output_embeddings的输出改为 None 即可，如下：

    def get_output_embeddings(self):
        return None # PretrainedModel.tie_weights 函数会将 lm_head 绑定为 shared 参数，导致张量并行情况下 lm_head 参数发生不匹配的错误

标签：head,set,embeddings,get,self,lm,Transformers,output
From： https://www.cnblogs.com/marsggbo/p/18174649

解决报错：Could not set property 'id' of 'class com.north.domain.Book' with value
报错原因问题描述：因为MyBatis-Plus默认的id自增策略使用的雪花算法org.mybatis.spring.MyBatisSystemException:nestedexceptionisorg.apache.ibatis.reflection.ReflectionException:Couldnotsetproperty'id'of'classcom.north.domain.Book'withvalue'1......
WPF Image open ZoomIn ZoomOut reset
//xaml<Windowx:Class="WpfApp94.MainWindow"xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"xmlns:d="http://schemas.mic......
Vue 3 setup
【一】setup函数setup函数的设计是为了使用vue3的组合式api，setup函数必须要有返回值，在里面定义的变量必须要返回出去才能在html里面使用【1】定义变量setup(){//1.定义变量跟正常写js是一样的letname='hqq'letage=18//setup函数必须要有返回值......
WPF Behavior Interaction Triggers EventTrigger EventName CallMethodAction Target
//xaml<Windowx:Class="WpfApp92.MainWindow"xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"xmlns:d="http://schemas.mic......
CMakeListx.txt --- include_directories和target_include_directories命令
1. include_directories语法include_directories([AFTER|BEFORE][SYSTEM]dir1[dir2...])作用将指定目录添加到编译器的头文件搜索路径之下，指定的目录被解释成当前源码路径的相对路径。参数默认情况下，include_directories命令会将目录添加到列表最后，可以通过命令设置......
Ubuntu/Linux系统中的multi-user.target
相关：https://www.cnblogs.com/devilmaycry812839668/p/17999041multi-user.target是Linux系统中systemd的一个目标。它表示系统已完成引导过程，并准备好供多个用户登录和使用系统。该目标通常包括在多用户环境中进行正常系统操作所需的服务和资源。RunLvlTargetU......
multiset用法总结
multiset是库中一个非常有用的类型，它可以看成一个序列，插入一个数，删除一个数都能够在O(logn)的时间内完成，而且他能时刻保证序列中的数是有序的，而且序列中可以存在重复的数。简单应用：通过一个程序来看如何使用multiset：#include<string>#include<iostream>#include<set>usin......
python批量get pikachu的shell脚本模板
声明:工具仅用于技术交流,请勿运行该脚本!!若造成损失,一切后果由使用者承担'''EXP:getshellusepikachu'''importrequests###############......
Unity 热更--AssetBundle学习笔记 1.0【AB包资源加载工具类的实现】
工具类封装通过上文中对AB包加载API的了解和简单使用，对AB包资源加载的几种方法进行封装，将其写入单例类中，如代码展示。确保每个AB资源包只加载一次：在LoadAssetBundleManager单例工具类中，首先提供基本的AB包及其AB包依赖包的加载方法，为保持AssetBundle只加载一次，使用DIctionary......
使用 __get__ 向已有类实例注入函数
突然有这样的特殊需求：向已经实例化的类对象添加新方法。例如，我的model本没有实现predict_step方法，现在我想向model注入这个函数：defpredict_step(self,batch,batch_idx,dataloader_idx=0):logits=self(**batch)["logits"]pred=F.softmax(logits,dim=-1)......

Huggingface Transformers实现张量并行的小坑 set/get_output_embeddings

相关文章

赞助商

阅读排行