FourierFormer: Transformer Meets Generalized Fourier Integral Theorem

Authors: Tan Nguyen, Minh Pham, Tam Nguyen, Khai Nguyen, Stanley Osher, Nhat Ho

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we numerically justify the advantage of Fourier Former over the baseline dot-product transformer on two large-scale tasks: language modeling on Wiki Text-103 [46] (Section 4.1) and image classification on Image Net [22, 67] (Section 4.2), time series classification on the UEA benchmark [5] (Section 4.3), and reinforcement learning on the D4RL Benchmark [29] (Section 4.4), and the machine translation on the IWSLT 14 De-En [10] (Section 4.5).We theoretically prove that our proposed Fourier integral kernels can efficiently approximate any key and query distributions.
Researcher Affiliation Academia Tan M. Nguyen Department of Mathematics University of California, Los Angeles tanmnguyen89@ucla.edu Minh Pham Department of Mathematics University of California, Los Angeles minhrose@ucla.edu Tam Nguyen Department of ECE Rice University nguyenminhtam9520@gmail.com Khai Nguyen Department of Statistics and Data Sciences University of Texas at Austin khainb@utexas.edu Stanley J. Osher Department of Mathematics University of California, Los Angeles sjo@math.ucla.edu Nhat Ho Department of Statistics and Data Sciences University of Texas at Austin minhnhat@utexas.edu
Pseudocode No The paper describes the steps of self-attention but does not provide structured pseudocode or algorithm blocks.
Open Source Code Yes Our Py Torch code with documentation can be found at https://github.com/minhtannguyen/Fourier Former_Neur IPS.
Open Datasets Yes language modeling on Wiki Text-103 [46] (Section 4.1) and image classification on Image Net [22, 67] (Section 4.2), time series classification on the UEA benchmark [5] (Section 4.3), and reinforcement learning on the D4RL Benchmark [29] (Section 4.4), and the machine translation on the IWSLT 14 De-En [10] (Section 4.5).
Dataset Splits Yes We report the validation and test perplexity (PPL) of Fourier Former versus the baseline transformer with the dot-product attention in Table 1.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments were provided in the paper.
Software Dependencies No The paper mentions 'Pytorch' but does not specify its version number or the version of CUDA used.
Experiment Setup Yes In all experiments, we made the constant R in Fourier attention (see equation (16)) to be a learnable scalar and set choose the function φ(x) = x4 (see Remark 2).