Recurrent Neural Networks/Attention is All you Need
This is a page for notes on Vaswani et al. 2017 Attention is All You Need.
Readings
editKey Questions
edit- What is an encoder-decoder recurrent neural network (RNN)?
- What improvements does the attention mechanism provide over RNN encoder-decoder?
- How does the model architecture in Vaswani et al. 2017 differ from the existing applications of attention in natural language processing?
- What is the difference between self-attention and regular attention and what are the benefits of the former as compared to the later?
- What is the difference between multi-head attention and regular attention and what are the benefits of the former as compared to the later?