Recurrent Neural Networks/Attention is All you Need

This is a page for notes on Vaswani et al. 2017 Attention is All You Need.

Readings

edit
  1. arXiv.org: Attention Is All You Need

Key Questions

edit
  1. What is an encoder-decoder recurrent neural network (RNN)?
  2. What improvements does the attention mechanism provide over RNN encoder-decoder?
  3. How does the model architecture in Vaswani et al. 2017 differ from the existing applications of attention in natural language processing?
  4. What is the difference between self-attention and regular attention and what are the benefits of the former as compared to the later?
  5. What is the difference between multi-head attention and regular attention and what are the benefits of the former as compared to the later?