arxiv:2107.10932

FNetAR: Mixing Tokens with Autoregressive Fourier Transforms

Published on Jul 22, 2021

Authors:

Mohammad Ramezanali ,

Abstract

In this note we examine the autoregressive generalization of the FNet algorithm, in which self-attention layers from the standard Transformer architecture are substituted with a trivial sparse-uniformsampling procedure based on Fourier transforms. Using the Wikitext-103 benchmark, we demonstratethat <PRE_TAG>FNetAR</POST_TAG> retains state-of-the-art performance (25.8 ppl) on the task of causal language modelingcompared to a Transformer-XL baseline (24.2 ppl) with only half the number self-attention layers,thus providing further evidence for the superfluity of deep neural networks with heavily compoundedattention mechanisms. The autoregressive Fourier transform could likely be used for parameterreduction on most Transformer-based time-series prediction models.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

No model linking this paper

Cite arxiv.org/abs/2107.10932 in a model README.md to link it from this page.

No dataset linking this paper

Cite arxiv.org/abs/2107.10932 in a dataset README.md to link it from this page.

No Space linking this paper

Cite arxiv.org/abs/2107.10932 in a Space README.md to link it from this page.

No Collection including this paper

Add this paper to a collection to link it from this page.