Archives for Primer


Primer’s improvements can be attributed to two simple modifications -- squaring ReLU activations and adding a depthwise convolution layer after each Q, K, and V projection in self-attention.


Primer’s improvements can be attributed to two simple modifications -- squaring ReLU activations and adding a depthwise convolution layer after each Q, K, and V projection in self-attention.