引言

搜广推类似场景都是retrieval + ranking两阶段方式，前者用从海量候选粗选一轮，后者再用负载模型，是效果、延迟和机器资源的trade-off的产物。
retrieval广泛使用embedding + ANN方案，比起invert index 个性化更强。

embedding

动机，word2vec 用向量表示高维的one-hot编码，向量的距离越近表示词义越相近。推荐系统里的协同过滤，FM/FFM模型都有类似的作用。

样本工程
选择样本是一门艺术
如何选正例： click or impr，另一条例子
如何选负例：负例采样（negative sampling 必要性）
hard negative + easy negative，
模型结构
简单双塔结构
每个塔可以做更多工作，可以上attention之类的
特征工程

泛化特征必要的，id类如果正例稀疏，至少多少正例才能拟合好吗？airbnb的场景

局限性

模型建模能力：user tower、item tower双塔结构限制效果，没有交叉特征；embedding长度固定，限制表征多兴趣；
很多黑盒，比如样本选择，评估方式（仅仅依赖A/B test效率太低）

ANN

暴力全库计算是效果最好的，不同ann算法有一定效果折损，但消耗更少时间、更少机器。
哪些算法

更进一步

是否能打破对模型结构的限制？阿里的一些工作

graph embedding？能利用图的结构

标签：Embedding,Introduction,模型,样本,negative,embedding,tower,Retrieval
From： https://www.cnblogs.com/lessmore/p/embedding_retrieval.html

what is the embeddings in AI?
Whatareembeddings?Textembeddingsareanaturallanguageprocessing(NLP)techniquethatconvertstextintonumericalvectors.Embeddingscapturesemanticmeaningandcontextwhichresultsintextwithsimilarmeaningshavingcloserembeddings.Forexa......
Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive M
目录概IRCoT代码TrivediH.,BalasubramanianN.,KhotT.,SabharwalA.Interleavingretrievalwithchain-of-thoughtreasoningforknowledge-intensivemulti-stepquestions.ACL,2023.概CoT(ChainofThought)+检索.IRCoT对于如上的问题,"Inwhatcountry......
Rethinking with Retrieval Faithful Large Language Model Inference
目录概Rethinkingwithretrieval(RR)代码HeH.,ZhangH.andRothD.Rethinkingwithretrieval:faithfullargelanguagemodelinference.arXivpreprintarXiv:2301.00303,2023.概LLM(LargeLanguageModel)+检索.Rethinkingwithretrieval(RR)CoT(Chai......
Embedding into a shared library fails-- c++ import numpy异常
rb reportatbugs.python.orgWedNov2610:13:39CET2008 Previousmessage: [New-bugs-announce][issue4433]_ctypes.COMErrorcrashNextmessage: [New-bugs-announce][issue4435]SphinxdoesnotshowfaileddoctestsinquietmodeMessagessortedby: [da......
REALM Retrieval-Augmented Language Model Pre-Training
目录概REALMGuuK.,LeeK.,TungZ.,PasupatP.andChangM.REALM:Retrieval-augmentedlanguagemodelpre-training.ICML,2020.概赋予生成模型检索的能力.REALM如上图所示,作者希望实现这样一个事情:给定一个'预测'任务,如"The[MASK]atthetopofthep......
MIT6.s081/6.828 lectrue1：Introduction and examples
目前课程官网能够查到2020,2021.2022秋季的课程表，但是视频都是2020年录制的那一版简单复习+回顾下自己的OS学习之旅参考资料：官网：https://pdos.csail.mit.edu/6.828/2022/schedule.html视频翻译：https://mit-public-courses-cn-translatio.gitbook.io/mit6-s081/教材英文......
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
目录概符号说明RAGRetriever:DPRGenerator:BARTLewisP.andPerezE.,etal.Retrieval-augmentedgenerationforknowledge-intensivenlptasks.NIPS,2020.概RAG:赋予模型检索的能力.符号说明\(\bm{x}\),输入序列;\(\bm{y}\),输出序列,长度为\(N\);RAG......
Query2box Reasoning over Knowledge Graphs in Vector Space using Box Embeddings
目录概符号说明Query2Box代码RenH.,HuW.andLeskovecJ.Query2box:Reasoningoverknowledgegraphsinvectorspaceusingboxembeddings.ICLR,2020.概Boxembedding用于查询判断,和我想的那个有很大差别啊.我对这方面不是很了解,只能记录个大概.符号说明......
com.mysql.cj.exceptions.UnableToConnectException: Public Key Retrieval is not al
在做学成在线项目时，启动项目报错：com.mysql.cj.exceptions.UnableToConnectException:PublicKeyRetrievalisnotallowedatsun.reflect.NativeConstructorAccessorImpl.newInstance0(NativeMethod)atsun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstr......
Multi-Modal Attention Network Learning for Semantic Source Code Retrieval 解读
Multi-ModalAttentionNetworkLearningfor SemanticSourceCodeRetrieva Multi-ModalAttentionNetworkLearningfor SemanticSourceCodeRetrieval，题目意思是用于语义源代码检索的多模态注意网络学习，2019年发表于ASE的##研究什么东西Background:研究代码检索技......

Introduction to Embedding for Retrieval 向量化召回简介

引言

embedding

ANN

更进一步

相关文章

赞助商

阅读排行