首页 > 其他分享 >MaoWei-2021-GeneratingSmoothPoseSequencesForDiverseHumanMotionPredition

MaoWei-2021-GeneratingSmoothPoseSequencesForDiverseHumanMotionPredition

时间:2022-10-14 19:01:22浏览次数:89  
标签:MaoWei GeneratingSmoothPoseSequencesForDiverseHumanMotionPredition right pose 20

# Generating Smooth Pose Sequences for Diverse Human Motion Prediction #paper


1. paper-info

1.1 Metadata

  • Author:: [[Wei Mao]], [[Miaomiao Liu]], [[Mathieu Salzmann]]
  • 作者机构:: Australian National Universiy
  • Keywords:: #HMP
  • Journal:: #IEEE/CVF
  • Date:: [[2022-01-13]]
  • 状态:: #Done

1.2 Abstract

Recent progress in stochastic motion prediction, i.e., predicting multiple possible future human motions given a single past pose sequence, has led to producing truly diverse future motions and even providing control over the motion of some body parts. However, to achieve this, the state-of-the-art method requires learning several mappings for diversity and a dedicated model for controllable motion prediction. In this paper, we introduce a unified deep generative network for both diverse and controllable motion prediction. To this end, we leverage the intuition that realistic human motions consist of smooth sequences of valid poses, and that, given limited data, learning a pose prior is much more tractable than a motion one. We therefore design a generator that predicts the motion of different body parts sequentially, and introduce a normalizing flow based pose prior, together with a joint angle loss, to achieve motion realism.Our experiments on two standard benchmark datasets, Human3.6M and HumanEva-I, demonstrate that our approach outperforms the state-of-the-art baselines in terms of both sample diversity and accuracy. The code is available at https://github.com/wei-mao-2019/gsps

  • Stochastic motion prediction
  • unified deep generative network for both diverse and controllable motion prediction
  • pose prior
  • normalizing flow based pose prior
  • joint angle loss

2. Introduction

  • 传统生成方法的问题:sample出来的数据只集中在主要的数据分布,而忽略了次要的特征。
  • DLow模型在以往的模型上提出了改进,但是需要在预训练模型上进行训练,并且在控制动作生成时,需要训练一个专用的模型。

https://www.cnblogs.com/guixu/p/16784293.html DLow论文阅读笔记

  • 作者提出了一种端到端的训练方法,并且能够完全控制动作预测。
  • contributions:
  1. We develop a unified framework achieving both diverse and part-based controllable human motion prediction, using a pre-ordered part sequence.
  2. We propose a pose prior and a joint angle constraint to regularize the training of our generator and encourage it to produce smooth pose sequences.

3. Approach

  • \(x \in \mathbb{R} ^D\):表示在单个帧中的3D关节坐标。
  • \(X = [x_1, x_2,...,x_H]^T\):过去的H帧动作序列。
  • \(Y = [x_{H+1}, x_{H+2}, ...,x_{H+T}]^T\) :待预测的动作序列。
  • \(\left \{ \hat{Y}_j \right \} ^K_{j=1}\): K个未来生成的动作序列。

3.1 Diverse Motion Prediction

由于我们需要生成动作序列的多样性,也就是不要求所有未来生成的序列都和ground truth接近,于是作者重新定义了重构误差,在生成的序列中至少有一个是接近ground truth的。于是重构误差\(\mathcal{L} _r\):

\[\mathcal{L} _r = \underset{j}{min}\left \| \hat{Y}_j - Y \right \|^2 \\ ;j \in \left \{ 1,2,...,K \right \} \tag{1} \]

但是该function只能够约束其中一个未来序列,为了对其他动作序列也有约束的作用。进行如下的设计。
每一个过去序列都对应一个ground-truth序列,但是对于生成的多样化序列都具有相似的过去动作序列,于是我们根据distance threshold,选择出与过去序列相似的过去序列,然后根据这些相似序列的ground-truth去优化其他生成的序列。
\({Y_p}^P_{p=1}\): 表示那些相似的ground-truth


图 3-1
Source:

于是定义多模态重构误差multi-modal reconstruction error:

\[\mathcal{L}_{mm} = \frac{1}{P}\sum_{p=1}^{P}\underset{j}{min}\left \| \hat{Y}_j - Y_p \right \|^2 ;j \in \left \{ 1,2,...,K \right \} \tag{2} \]

将多模态重构误差如此设计之后,可以保证至少有一个预测序列与一个相似ground-truth向接近。
为了进一步促进多样性,与DLow一样,使用diversity-promoting loss.

\[\mathcal{L}_{d}=\frac{2}{K(K-1)} \sum_{j=1}^{K} \sum_{k-j+1}^{K} e^{-\frac{\left\|\dot{\mathrm{Y}}_{j}-\dot{\mathbf{Y}}_{k}\right\|_{1}}{\alpha}} \tag{3} \]

\(\alpha\): 归一化因子
该loss function的缺点是会导致生成的动作序列不真实,特别是在\(P<K\) 时,解决该方法最直接的策略是增加一个motion prior,但是这需要大量的训练数据,于是作者根据动作序列由人体姿态帧构成,提出了pose priorangle losses
Pose prior
作者使用normalizing flow来表示pose prior

normalizing flow: 将一个未知分布转换到一个概率密度已知的分布。

通过一个双射可微的方法\(f(.)\)将human pose distribution\(p(x)\) 映射成latent representation\(h=f(x)\), \(h \sim N(0, I)\)
通过normalizing flow之后,可以计算pose\(x\)的可能性。

\[p(x) = g(h)\left | det(\frac{\partial f}{\partial x} ) \right | \]

\(g(h)=N(h|0, I)\),\(det(\frac{\partial f}{\partial x} )\) 表示\(f(.)\)对\(x\)的雅克比行列式。
\(f\),通常通过神经网络学习出来,作者使用了3层全连接层。并且使用QR decompositionmonotonic activation function去确保\(f\)可逆。
给定human pose数据集\(\mathcal{D}\) ,通过最大化采样数据的log-likelihood来学习\(f\),

\[\begin{aligned} f^{*} &=\arg \max _{f} \sum_{\mathbf{x} \in \mathcal{D}} \log p(\mathbf{x}) \\ &=\arg \max _{f} \sum_{\mathbf{x} \in \mathcal{D}} \log g(\mathbf{h})+\log \left|\operatorname{det}\left(\frac{\partial f}{\partial \mathbf{x}}\right)\right| . \end{aligned} \]

给定\(f\)后,可以定义loss function 来保证生成可行的姿态。从生成姿态的最小化似然函数来表示:

\[\begin{aligned} \mathcal{L}_{n f} &=-\log p(\hat{\mathbf{x}}) \\ &=-\log g(\hat{\mathbf{h}})-\log \left|\operatorname{det}\left(\frac{\partial f^{*}}{\partial \hat{\mathbf{x}}}\right)\right| \end{aligned} \tag{4} \]

\(\hat{\mathbf{h}}=f(\hat(x))\) ; \(g(\hat(h)) = N(\hat(g)|0, I)\)
Joint angle loss


图 3-2 Example of angle limits
Source: http://arxiv.org/abs/2108.08422

身体部位由3个关节点定义平面和法向量,躯干部分由躯干的3个关节点定义向量,然后在human pose dataset \(\mathcal{D}\) 上计算这些角度。
定义angle losses:

\[\mathcal{L}_{a_{j}}=\left\{\begin{array}{l} \left(a_{j}(\hat{\mathbf{x}})-l_{a_{j}}\right)^{2}, \text { if } a_{j}<l_{a_{j}} \\ \left(a_{j}(\hat{\mathbf{x}})-u_{a_{j}}\right)^{2}, \text { if } a_{j}>u_{a_{j}} \\ 0, \text { otherwise } \end{array}\right. \]

\[\mathcal{L}_a = {\textstyle \sum_{j=1}^{L}} \mathcal{L}_{a_j} \tag{5} \]

\(\{a_j\}_{j=1}^L\):表示\(L\)个预先计算好的角度,\(l_{a_j}\):表示角度下限,\(\mu_{a_j}\):表示角度上限。
\(a_j(\hat{x})\):表示从姿态\(\hat{x}\) 计算而来的角度。

Predicting smooth trajectory
为了使生成的动作序列更加自然顺滑,作者采用了基于Discrete Cosine Tranform DCT的轨道表示技术。


图 3-3
Source: http://arxiv.org/abs/2108.08422

3.2 Controllable Motion Prediction

通过将人体结构分为不同的部分,分别预测不同部位的动作序列,来控制动作生成。


图 3-4 Body parts
Source: http://arxiv.org/abs/2108.08422

\(Y=[Y^(1), Y^(2), ..., Y^(N)]\) ,\(Y^(i) \in \mathbb{R}^{T\times D_i}\) \(N\): 将身体分为\(N\)部分

\[p(Y|X) = p(Y^{(1)}|X)p(Y^{(2)}|X, Y^{(1)})...p(Y^{(N)}|X,\{Y^{(i)\}_{i=1}^{N-1}}) \]

对于\(i-th\)的身体部位,生成\(K\)个\((i+1)-th\)的身体部位,最后会生成\(K^N\)个动作序列。


图 3-5 Overview of the approach
Source: http://arxiv.org/abs/2108.08422

diversity-promoting loss 改写成per-body-part losss:

\[\mathcal{L}_{d_{i}}=\frac{2}{K(K-1)} \sum_{j=1}^{K} \sum_{k=j+1}^{K} e^{-\frac{\left\|\hat{\mathbf{Y}}_{\cdot, j}^{(i)}-\hat{\mathbf{Y}}_{-k, k}^{(i)}\right\|_{1}}{\alpha^{(i)}}}, \tag{6} \]

\(i\): 是每个身体部分的序号。最后的loss function就可以表示成

\[\mathcal{L}=\lambda_{n f} \mathcal{L}_{n f}+\lambda_{a} \mathcal{L}_{a}+\sum_{i=1}^{N} \lambda_{d_{i}} \mathcal{L}_{d_{i}}+\lambda_{r} \mathcal{L}_{r}+\lambda_{m m} \mathcal{L}_{m m} \]

\[\mathcal{L}=\lambda_{n f} (4))+\lambda_{a} (5))+\sum_{i=1}^{N} \lambda_{d_{i}} (6)+\lambda_{r} (1))+\lambda_{m m} (2)) \]

\((4)\):保证生成的姿态自然。
\((5)\):生成的姿态符合人体角度。
\((6)\):保证生成动作序列的多样性。
\((1)\):重构误差;保证至少有一个符合ground-truth
\((2)\):多模态重构误差;在保证生成动作多样性和与原始序列接近。


图 3-6 Generator
Source: http://arxiv.org/abs/2108.08422

每一个\(g^{(i)}\) 由Graph Convolutional Network(GCN) 组成,每一个GCN块由几个graph convolution layers, 每一个layer学习一个特征映射将特征\(F \in \mathbb{R}^{D\times|F|}\) 映射成\(F'=tanh(AFW)\) ;
\(A \in\mathbb{R}^{D\times D}\) : represents a fully connected graph with learnable connectivity
\(W \in \mathbb{R}^{|F| \times |F'|}\) : a matrix of trainable weights.

4. Experiments

  • Dataset
    • Human3.6M
    • HumanEva-I
  • Metrics
    • Average Pairwise Distance(APD):衡量多样性和精确性
      • \(\frac{2}{K(K-1)} {\textstyle \sum_{i=1}^{K}} {\textstyle \sum_{j=i+1}^{K}}\left \| \hat{Y_i} -\hat{Y_j} \right \| _2\)
    • Average Displacement Error(ADE):整个序列的重构精确度
      • \(\frac{1}{T}\underset{i}{min} \left \| \hat{Y_i} -Y \right \| _2\)
    • Final Displacement Error(FDE):最后一个未来姿势的重构精确度
      • \(\underset{i}{min}\left \| \hat{Y_i}[T] - Y[T] \right \|_2\)
    • the multi-model version of ADE
    • the multi-model version of FDE
  • Baselines
    • Deterministic motion prediction methods
      • ERD
      • acLSTM
    • Stochastic motion prediction methods without diversity-promoting technique
      • CVAE based model
        • Pose-Knows
        • MT-VAE
      • CGAN based model
        • HP_GAN
    • Diverse motion prediction methods
      • BoM
      • GMVAE
      • DeLiGAN
      • DSF
      • DLow

5. 总结

DLow在预训练模型上对特征进行变换。
本文方法直接端到端输出。
重点还是在损失函数的设计上,控制了输出。精确度依然靠重构误差保证,而多样性设计上,将人体分为不同的部位,生成每个部位的多个序列。来达到多样性。在控制损失方面,利用pose prior,角度,使得产生多样性的同时确保生成的动作序列更加真实。



标签:MaoWei,GeneratingSmoothPoseSequencesForDiverseHumanMotionPredition,right,pose,20
From: https://www.cnblogs.com/guixu/p/16792658.html

相关文章