Relative Positional Encoding for Transformers with Linear Complexity

  • Antoine Liutkus
  • , Ondřej Cífka
  • , Shih Lun Wu
  • , Umut Şimşekli
  • , Yi Hsuan Yang
  • , Gaël Richard

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recent advances in Transformer models allow for unprecedented sequence lengths, due to linear space and time complexity. In the meantime, relative positional encoding (RPE) was proposed as beneficial for classical Transformers and consists in exploiting lags instead of absolute positions for inference. Still, RPE is not available for the recent linear-variants of the Transformer, because it requires the explicit computation of the attention matrix, which is precisely what is avoided by such methods. In this paper, we bridge this gap and present Stochastic Positional Encoding as a way to generate PE that can be used as a replacement to the classical additive (sinusoidal) PE and provably behaves like RPE. The main theoretical contribution is to make a connection between positional encoding and cross-covariance structures of correlated Gaussian processes. We illustrate the performance of our approach on the Long-Range Arena benchmark and on music generation.

Original languageEnglish
Title of host publicationProceedings of the 38th International Conference on Machine Learning, ICML 2021
PublisherML Research Press
Pages7067-7079
Number of pages13
ISBN (Electronic)9781713845065
Publication statusPublished - 1 Jan 2021
Event38th International Conference on Machine Learning, ICML 2021 - Virtual, Online
Duration: 18 Jul 202124 Jul 2021

Publication series

NameProceedings of Machine Learning Research
Volume139
ISSN (Electronic)2640-3498

Conference

Conference38th International Conference on Machine Learning, ICML 2021
CityVirtual, Online
Period18/07/2124/07/21

Fingerprint

Dive into the research topics of 'Relative Positional Encoding for Transformers with Linear Complexity'. Together they form a unique fingerprint.

Cite this