Sequential Recommendation via Stochastic Self-Attention

时间：2023-04-07 13:56:45浏览次数：69

标签：结点 via mathbf bm Self Attention mu mathcal Sigma

概
符号说明
Motivation
STOSA
代码

Fan Z., Liu Z., Wang A., Nazari Z., Zheng L., Peng H. and Yu P. S. Sequential recommendation via stochastic self-attention. International World Wide Web Conference (WWW), 2022.

概

Stochastic embeddings 和 Wasserstein attention.

符号说明

$\mathcal{U}$, users;
$\mathcal{V}$, items;
$\mathcal{S}^u = [v_1^u, v_2^u, \ldots, v_{|\mathcal{S}|^u}^u]$, sequence;
$p(v_{|\mathcal{S}^u| + 1}^{(u)} = v | \mathcal{S}^u)$, next-item 预测概率.

Motivation

一般的序列推荐模型采用'固定'的 embedding 表示 $\mathbf{M} \in \mathbb{R}^{|\mathcal{V}| \times d}$, 但是这种方式不能很好地表示用户兴趣或者产品特征的不确定度.
此外, 一般的 Attention 一般如下形式:

\[\text{SA}(\mathbf{Q}, \mathbf{K}, \mathbf{V}) = \text{softmax}(\frac{\mathbf{Q}\mathbf{K}^T}{\sqrt{d}}) \mathbf{V}. \]
但是作者认为, 这种 '内积' 不是并不是合格的距离度量 (不满足三角不等式), 所以难以正确度量序列的先后关系.

STOSA

为了度量结点的不确定性, 作者将每个结点建模成一个高斯分布:

\[\mathcal{N}(\bm{\mu}_v, \Sigma_{v}). \]
具体的, 我们构建两个 embedding table: $\mathbf{M}^{\mu} \in \mathbb{R}^{|\mathcal{V}| \times d}, \mathbf{M}^{\Sigma} \in \mathbb{R}^{|\mathcal{V}| \times d}$, 故 (这里我们省略位置编码)

\[\bm{\mu}_v = \bm{m}_v^{\mu}, \: \Sigma_v = \text{diag}(\mathbf{m}_v^{\Sigma}). \]
可以这么认为, $\bm{\mu}_v$ 是结点 $v$ 的基本表示, $\Sigma_v$ 融合了结点 $v$ 的一个不确定度.
为了替换一般的基于内积的 attention, 作者采用 Wasserstein attention:
1. 给定两个结点, 假设它们的分布分别为 $\mathcal{N}(\bm{\mu}_u, \Sigma_u), \mathcal{N}_v(\bm{\mu}_v, \Sigma_v)$, 则二者的 (unnormalized) attention 为:
\[\mathbf{A}_{uv} = -W_2(u, v) = -(\|\bm{\mu}_u - \bm{\mu}_v\|_2^2 + \text{tr}(\Sigma_u + \Sigma_v - 2(\Sigma_u^{1/2}\Sigma_{v}\Sigma_{u}^{1/2})^{1/2})). \]
需要注意的是, 在 feed-forward 的过程中, 每个结点的分布在改变, 此时, 为了保证 $\Sigma$ 有意义 (即满足半正定性), 每次需要对特征进行后处理:

\[\text{ELU}(\cdot) + 1 \in [0, +\infty]. \]
因为不像 SASRec, STOSA 每次返回的是 $\hat{\bm{\mu}}, \hat{\Sigma}$, 相当于是一个分布, 所以计算 score 的时候, 需要再计算它和其它结点的负Wasserstein 距离.

代码

[official]

标签：结点,via,mathbf,bm,Self,Attention,mu,mathcal,Sigma
From： https://www.cnblogs.com/MTandHJ/p/17295897.html

A C++ program that prints itself
#include<iostream>usingnamespacestd;intmain(){strings="cout<<\"#include<iostream>\\nusingnamespacestd;\\n\\nintmain(){\\nstrings=\\\"\";\nfor(chari:s)\nif(i==�......
C. Place for a Selfie
C.PlaceforaSelfieTheuniverseisacoordinateplane.Thereare$n$spacehighways,eachofwhichisastraightline$y=kx$passingthroughtheorigin$(0,0)$.Also,thereare$m$asteroidbeltsontheplane,whichwerepresentasopenupwardsparabo......
Objective-C的self.用法的一些总结
关于什么时候用全局变量，什么时候用self.赋值的问题，其实是和Objective-c的存取方法有关,网上很多人也都这么解答的,不过如何与存取方式有关究竟他们之间的是什么样的关系就很少有同学回答了。我总结了一下,发出来给大家参考.有什么问题请大家斧正. 进入正题,我们经常会在官方文......
P6146 [USACO20FEB]Help Yourself G 题解
题目链接先按左端点从小到大排序。设$f(i)$表示前$i$条线段的所有子集的复杂度之和。考虑从$f(i-1)$转移到$f(i)$，即考虑新加进来第$i$条线段的过程。第$i$条线段加进来所新产生的贡献分两种：设除了第$i$条线段选中的线段集合为$S$，则若$S$......
junit单元测试报错：java.lang.NoClassDefFoundError: org/hamcrest/SelfDescribing
今天在复习的时候对对一些知识点进行巩固，用到了junit-4.12.jar，手动导入jar包，然后运行然后报错：java.lang.NoClassDefFoundError:org/hamcrest/SelfDescribing。刚开始我以为代码错了，看了看发现不是代码的问题，是导包的问题。然后查询了百度，发现了是版本的问题：然后说换个低版本的就......
论文阅读笔记（五）：Hire-MLP Vision MLP via Hierarchical Rearrangement
论文阅读笔记（五）：Hire-MLP:VisionMLPviaHierarchicalRearrangement摘要先前的MLPs网络接受flattened图像patches作为输入，使得他们对于不同的输入大小缺乏灵活性，并且......
admin使用及models.py中__str__(self)的用法
1.admin页面中修改成中文表名： 2.models中的__str__用法： 2_2.models中的__str__用法： ......
Attention与SelfAttention
目录Seq2Seq+AttentionAttention的原理方法一（Usedintheoriginalpaper）方法二（morepopular，thesametoTransformer）SummarySelfAttentionSimpleRNN与Attention当前状......
【825】journal abbreviation elsevier，投稿杂志缩写查找
https://jcr-clarivate-com.wwwproxy1.library.unsw.edu.au/jcr/browse-journals（B站有个视频）直接Google查询方法三：search ......
pytorch ssd 代码疑惑， flt[(rank < self.top_k).unsqueeze(-1).expand_as(flt)].fill_
https://github.com/amdegroot/ssd.pytorch/blob/5b0b77faa955c1917b0c710d770739ba8fbff9b7/layers/functions/detection.py#L58defforward(self,loc_data,conf_d......

Sequential Recommendation via Stochastic Self-Attention

概

符号说明

Motivation

STOSA

代码

相关文章

赞助商

阅读排行