Reference:
https://builtin.com/artificial-intelligence/transformer-neural-network
1. Advantages over RNN
- Overcomes the vanishing gradient issue by multi-headed attention layer;
- Input sequence can be passed and processed parallelly so that GPU can be used effectively.