A fast and simple algorithm for training neural probabilistic language models

时间：2023-12-13 10:47:41浏览次数：42

标签：training frac language algorithm ln theta NCE

概
Noise contrastive estimation

Mnih A. and Teh Y. W. A fast and simple algorithm for training neural probabilistic language models. ICML, 2012.

概

NCE 用在语言模型的训练上.

Noise contrastive estimation

给定 context \(h\), 下一个词为 \(w\) 的条件概率按照如下方式定义:

\[ P_{\theta}(w|h) = \frac{\exp(s_{\theta}(w, h))}{\sum_{w'} \exp(s_{\theta}(w', h))}, \]
作者认为, 当词表过大的时候, normalizing term \(Z^h = \sum_{w'} \exp(s_{\theta}(w', h))\) 的计算过于消耗时间了. 所以本文求助 NCE 来解决这一个问题.
对于这类问题, NCE 的处理方式是设计一个二分类任务:

\[P(C=1|w, h; \theta) = \frac{P_{\theta}(w|h)}{P_{\theta}(w|h) + k P_n(w|h)}, \]
其中 \(P_n(w|h)\) 是一个噪声分布, \(k\) 表示采样过程中, \(w\) 采样自真实分布和噪声分布的比例为 \(1:k\).
令 \(c^h=\ln Z^h\), 我们有

\[\ln P_{\theta}(w|h) = s_{\theta}(w, h) - c. \]
此时

\[P(C=1|w; \theta) = \sigma(s'_{\theta}(w, h)), \\ s'_{\theta}(w, h ) = s_{\theta}(w, h) - c^h - \ln kP_n(w|h). \]
NCE 将 \(c^h\) 也作为一个参数训练, 然后具体的损失为 (对于每个 \(h\)):

\[ -\mathbb{E}_{w \sim P(w|h)} \log \sigma(s'_{\theta}(w, h)) - k \mathbb{E}_{w \sim P_n(w|h)} \log(1 - \sigma(s'_{\theta}(w|h))). \]
这里的一个问题是, 自然语言里的 context \(h\) 太多了, 所以很难说给每个 \(c^h\) 都设为一个参数去学习, 作者发现, \(c^h \equiv 0\) 实验中的效果就很好. 故而, 实际中我们所采用的为:

\[s'_{\theta}(w, h) = s_{\theta}(w, h) - \ln kP_n(w|h). \]
特别地, 如果我们采取一种最简单的噪声分布, 即 \(P_n(w|h) = P_n(w) = \frac{1}{N}\) (\(N\) 为词的个数), 我们有:

\[s'_{\theta}(w, h) = s_{\theta}(w, h) - \ln \frac{k}{N}. \]
进一步地, 我们可以把 \(-\ln \frac{k}{N}\) 也省略, 只要我们相信 \(s_{\theta}(w, h)\) 本身有能力意识到这一点. 实际上, 这也是 Word2Vec 中的 NEG (Negative sampling) 的做法.

标签：training,frac,language,algorithm,ln,theta,NCE
From： https://www.cnblogs.com/MTandHJ/p/17898524.html

Recommendation as Instruction Following: A Large Language Model Empowered Recomm
目录概InstructRecInstructionGenerationZhangJ.,XieR.,HouY.,ZhaoW.X.,LinL.,WenJ.Recommendationasinstructionfollowing:alargelanguagemodelempoweredrecommendationapproach.2023.概通过指令跟随来利用大模型进行推荐,本文介绍了不同的指令......
LPI-IBWA: Predicting lncRNA-protein interactions based on an improved Bi-Random
LPI-IBWA:PredictinglncRNA-proteininteractionsbasedonanimprovedBi-RandomwalkalgorithmMinzhuXie 1, RuijieXie 2, HaoWang 3Affiliations expandPMID: 37972912 DOI: 10.1016/j.ymeth.2023.11.007 SigninAbstractManystudies......
B4185. LPI-IBWA:Predicting lncRNA-protein Interactions Based on Improved Bi-Ran
B4185.LPI-IBWA:PredictinglncRNA-proteinInteractionsBasedonImprovedBi-RandomWalkAlgorithmMinzhuXie1,HaoWang1 andRuijieXi11HunanNormalUniversityAbstract:Manystudieshaveshownthatlong-chainnoncodingRNAs(lncRNAs)areinvolvedinav......
【论文阅读笔记】【多模态-Referring & Grounding】 Grounded Language-Image Pre-tra
GLIPCVPR2022(Oral,BestPaperFinalist)读论文思考的问题论文试图解决什么问题？写作背景是什么？问题：如何将视觉-语言预训练技术应用在以目标检测为代表的fine-grainedimageunderstanding上面？如何在增加训练数据的同时，使目标检测模型具有良好的语义理解能力，能......
《REBEL Relation Extraction By End-to-end Language generation》阅读笔记
论文来源代码地址相关视频（YouTube）相关概念：1.Whatisnaturallanguageunderstanding(NLU)?Naturallanguageunderstanding(NLU)isabranchofartificialintelligence(AI)thatusescomputersoftwaretounderstandinputintheformofsentencesusin......
GLIP:Grounded Language-Image Pre-training
GroundedLanguage-ImagePre-training目录GroundedLanguage-ImagePre-training简介摘要Introduction统一的损失函数方法总结参考资料GLIPv1:GroundedLanguage-ImagePre-trainingGLIPv2:UnifyingLocalizationandVLUnderstanding代码地址:https://github.com/micr......
国际化-语言代码表-Language Codes
afAfrikaans南非语af-ZAAfrikaans(SouthAfrica)南非语af Afrikaans 南非语af-ZA Afrikaans(SouthAfrica) 南非语ar Arabic 阿拉伯语ar-AE Arabic(U.A.E.) 阿拉伯语(阿联酋)ar-BH Arabic(Bahrain) 阿拉伯语(巴林)ar-DZ Arabic(Alge......
什么是 SAP XML annotation language server
来自SAP官方的解释：TheXMLannotationlanguageserveraccelerateshowyouworkwithannotationsinthecodeeditor.Context-sensitivecodecompletiondisplayssuggestionsthatarerelevanttowhereyouareintheannotationfileforyourapp.Asyoutypea......
Misc_XCTF_WriteUp | Training-Stegano-1
题目提示：这是我能想到的最基础的图片隐写术题目：分析文件属性没有特别的东西。这么小的图片用StegSolve也看不见啥，用010editor打开看看：有一段文本，大意是：“看看十六进制编辑显示了什么:passwd:steganoI”将steganoI作为flag提交，通过。FlagsteganoI参考bmp位......
Towards Reasoning in Large Language Models A Survey
Reasoning定义推理：以逻辑和系统的方式进行思考，利用证据和过往经验来得出结论或作出抉择。演绎推理DeductiveReasoning结论来源于前提假设的阳性前提假设：哺乳动物都有肾脏前提假设：鲸是哺乳动物结论：鲸有肾脏归纳推理InductiveReasoning结论来源于观测或者证据......

A fast and simple algorithm for training neural probabilistic language models

概

Noise contrastive estimation

相关文章

赞助商

阅读排行