reproducibilityindex.ai

FourierFormer: Transformer Meets Generalized Fourier Integral Theorem

Authors: Tan Nguyen, Minh Pham, Tam Nguyen, Khai Nguyen, Stanley Osher, Nhat Ho

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we numerically justify the advantage of Fourier Former over the baseline dot-product transformer on two large-scale tasks: language modeling on Wiki Text-103 [46] (Section 4.1) and image classiﬁcation on Image Net [22, 67] (Section 4.2), time series classiﬁcation on the UEA benchmark [5] (Section 4.3), and reinforcement learning on the D4RL Benchmark [29] (Section 4.4), and the machine translation on the IWSLT 14 De-En [10] (Section 4.5).We theoretically prove that our proposed Fourier integral kernels can efﬁciently approximate any key and query distributions.
Researcher Affiliation	Academia	Tan M. Nguyen Department of Mathematics University of California, Los Angeles tanmnguyen89@ucla.edu Minh Pham Department of Mathematics University of California, Los Angeles minhrose@ucla.edu Tam Nguyen Department of ECE Rice University nguyenminhtam9520@gmail.com Khai Nguyen Department of Statistics and Data Sciences University of Texas at Austin khainb@utexas.edu Stanley J. Osher Department of Mathematics University of California, Los Angeles sjo@math.ucla.edu Nhat Ho Department of Statistics and Data Sciences University of Texas at Austin minhnhat@utexas.edu
Pseudocode	No	The paper describes the steps of self-attention but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our Py Torch code with documentation can be found at https://github.com/minhtannguyen/Fourier Former_Neur IPS.
Open Datasets	Yes	language modeling on Wiki Text-103 [46] (Section 4.1) and image classiﬁcation on Image Net [22, 67] (Section 4.2), time series classiﬁcation on the UEA benchmark [5] (Section 4.3), and reinforcement learning on the D4RL Benchmark [29] (Section 4.4), and the machine translation on the IWSLT 14 De-En [10] (Section 4.5).
Dataset Splits	Yes	We report the validation and test perplexity (PPL) of Fourier Former versus the baseline transformer with the dot-product attention in Table 1.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments were provided in the paper.
Software Dependencies	No	The paper mentions 'Pytorch' but does not specify its version number or the version of CUDA used.
Experiment Setup	Yes	In all experiments, we made the constant R in Fourier attention (see equation (16)) to be a learnable scalar and set choose the function φ(x) = x4 (see Remark 2).