# Back to MLP: A Simple Baseline for Human Motion Prediction #paper
1. paper-info
1.1 Metadata
- Author:: [[Wen Guo]], [[Yuming Du]], [[Xi Shen]], [[Vincent Lepetit]], [[Xavier Alameda-Pineda]], [[Francesc Moreno-Noguer]]
- 作者机构::
- Keywords:: #HMP , #FC , #LayerNormalization
- Journal::
- Date:: [[2022-08-25]]
- 状态:: #Done
1.2 Abstract
This paper tackles the problem of human motion prediction, consisting in forecasting future body poses from historically observed sequences. State-of-the-art approaches provide good results, however, they rely on deep learning architectures of arbitrary complexity, such as Recurrent Neural Networks(RNN), Transformers or Graph Convolutional Networks(GCN), typically requiring multiple training stages and more than 2 million parameters. In this paper, we show that, after combining with a series of standard practices, such as applying Discrete Cosine Transform(DCT), predicting residual displacement of joints and optimizing velocity as an auxiliary loss, a light-weight network based on multi-layer perceptrons(MLPs) with only 0.14 million parameters can surpass the state-of-the-art performance. An exhaustive evaluation on the Human3.6M, AMASS, and 3DPW datasets shows that our method, named siMLPe, consistently outperforms all other approaches. We hope that our simple method could serve as a strong baseline for the community and allow re-thinking of the human motion prediction problem. The code is publicly available at \url{https://github.com/dulucas/siMLPe}.
1.3 Introduction
traditional approaches:
- Hidden Markov models
- Gaussian process latent variable models
Deeplearning methods - RNN-based
- GCN-based
- Transformer-based
本篇论文的contribution: - 用一种很简单的模型达到了很好地效果,仅仅用到了全连接层
- 提出了一种新模型
由三部分组成fully connected layers
layer normalization
transpose operation
3.1 Network architecture
图 3-1 网络结构图
DCT: Discrete Cosine Transform
离散余弦变换, 可以将图片信息压缩加工成序列.
DCT matrix D为
图 3-2 DCT matrix
IDCT:Inverse Discrete Cosine Transform
FC: 全连接层
Trans:transpose operation
LN: layer normalization
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer Normalization. In arXiv Preprint, 2016.
图 3-3 数学解释
图 3-4 损失函数
优化策略: Adam optimizer
Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. In arXiv Preprint, 2014.
4. Experiments
4.1 Dataset and evaluation metric
Human3.6M dataset
Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. Human3. 6m: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013.
AMASS dataset
Naureen Mahmood, Nima Ghorbani, Nikolaus F. Troje, Gerard Pons-Moll, and Michael J. Black. AMASS: Archive of Motion Capture as Surface Shapes. In International Conference on Computer Vision, 2019.
3DPW dataset
Timo Von Marcard, Roberto Henschel, Michael J. Black, Bodo Rosenhahn, and Gerard Pons-Moll. Recovering Accurate 3D Human Pose in the Wild Using IMUs and a Moving Camera. In European Conference on Computer Vision, 2018.
4.2 结果
图 4-1 Human3.6m结果
图 4-2 各种姿势结果
图 4-3
图 4-4 平均误差
4.3 Ablation study
- Number of MLP blocks
图 4-5 不同的层数比较
- Network architecture
图 4-6 各层结构的重要性
transpose operation
和Layer normalization
- Data augmentation
图 4-7 数据增强结果
仅仅使用了front-back flip
- Loss:
图 4-8 速度Loss的重要性
- Learning residual displacement: 比较在使用不同结构残差网络的性能
图 4-9 残差结构
5. 总结
标签:based,Baseline,Simple,Back,Motion,MLP,paper,Human,DCT From: https://www.cnblogs.com/guixu/p/16739489.html