Back to MLP: A Simple Baseline for Human Motion Prediction

标签：based Baseline Simple Back Motion MLP paper Human DCT

# Back to MLP: A Simple Baseline for Human Motion Prediction #paper

1. paper-info

1.1 Metadata

Author:: [[Wen Guo]], [[Yuming Du]], [[Xi Shen]], [[Vincent Lepetit]], [[Xavier Alameda-Pineda]], [[Francesc Moreno-Noguer]]
作者机构::
Keywords:: #HMP , #FC , #LayerNormalization
Journal::
Date:: [[2022-08-25]]
状态:: #Done

链接:: http://arxiv.org/abs/2207.01567

1.2 Abstract

This paper tackles the problem of human motion prediction, consisting in forecasting future body poses from historically observed sequences. State-of-the-art approaches provide good results, however, they rely on deep learning architectures of arbitrary complexity, such as Recurrent Neural Networks(RNN), Transformers or Graph Convolutional Networks(GCN), typically requiring multiple training stages and more than 2 million parameters. In this paper, we show that, after combining with a series of standard practices, such as applying Discrete Cosine Transform(DCT), predicting residual displacement of joints and optimizing velocity as an auxiliary loss, a light-weight network based on multi-layer perceptrons(MLPs) with only 0.14 million parameters can surpass the state-of-the-art performance. An exhaustive evaluation on the Human3.6M, AMASS, and 3DPW datasets shows that our method, named siMLPe, consistently outperforms all other approaches. We hope that our simple method could serve as a strong baseline for the community and allow re-thinking of the human motion prediction problem. The code is publicly available at \url{https://github.com/dulucas/siMLPe}.

1.3 Introduction

traditional approaches:

Hidden Markov models
Gaussian process latent variable models
Deeplearning methods
RNN-based
GCN-based
Transformer-based
之前的深度模型可以达到很好地效果，但是模型太复杂；模型参数太多；训练时间长。
本篇论文的contribution：
用一种很简单的模型达到了很好地效果，仅仅用到了全连接层
提出了一种新模型SIMLPE 由三部分组成
- fully connected layers
- layer normalization
- transpose operation

3. SIMLPE

3.1 Network architecture

图 3-1 网络结构图

DCT: Discrete Cosine Transform 离散余弦变换, 可以将图片信息压缩加工成序列.

https://www.youtube.com/watch?v=Q2aEzeMDHMA
https://zhuanlan.zhihu.com/p/85299446

DCT matrix D为

图 3-2 DCT matrix

IDCT：Inverse Discrete Cosine Transform
FC: 全连接层
Trans：transpose operation 转置操作
LN: layer normalization

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer Normalization. In arXiv Preprint, 2016.

图 3-3 数学解释损失函数：

图 3-4 损失函数

损失函数由两部分组成，一部分是预测序列与真实序列的误差，一部分是在t时刻的速度误差。
优化策略： Adam optimizer

Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. In arXiv Preprint, 2014.

4. Experiments

4.1 Dataset and evaluation metric

数据集

Human3.6M dataset

Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. Human3. 6m: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013.

AMASS dataset

Naureen Mahmood, Nima Ghorbani, Nikolaus F. Troje, Gerard Pons-Moll, and Michael J. Black. AMASS: Archive of Motion Capture as Surface Shapes. In International Conference on Computer Vision, 2019.

3DPW dataset

Timo Von Marcard, Roberto Henschel, Michael J. Black, Bodo Rosenhahn, and Gerard Pons-Moll. Recovering Accurate 3D Human Pose in the Wild Using IMUs and a Moving Camera. In European Conference on Computer Vision, 2018.