FourierFormer: Transformer Meets Generalized Fourier Integral Theorem
Authors: Tan Nguyen, Minh Pham, Tam Nguyen, Khai Nguyen, Stanley Osher, Nhat Ho
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we numerically justify the advantage of Fourier Former over the baseline dot-product transformer on two large-scale tasks: language modeling on Wiki Text-103 [46] (Section 4.1) and image classification on Image Net [22, 67] (Section 4.2), time series classification on the UEA benchmark [5] (Section 4.3), and reinforcement learning on the D4RL Benchmark [29] (Section 4.4), and the machine translation on the IWSLT 14 De-En [10] (Section 4.5).We theoretically prove that our proposed Fourier integral kernels can efficiently approximate any key and query distributions. |
| Researcher Affiliation | Academia | Tan M. Nguyen Department of Mathematics University of California, Los Angeles tanmnguyen89@ucla.edu Minh Pham Department of Mathematics University of California, Los Angeles minhrose@ucla.edu Tam Nguyen Department of ECE Rice University nguyenminhtam9520@gmail.com Khai Nguyen Department of Statistics and Data Sciences University of Texas at Austin khainb@utexas.edu Stanley J. Osher Department of Mathematics University of California, Los Angeles sjo@math.ucla.edu Nhat Ho Department of Statistics and Data Sciences University of Texas at Austin minhnhat@utexas.edu |
| Pseudocode | No | The paper describes the steps of self-attention but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our Py Torch code with documentation can be found at https://github.com/minhtannguyen/Fourier Former_Neur IPS. |
| Open Datasets | Yes | language modeling on Wiki Text-103 [46] (Section 4.1) and image classification on Image Net [22, 67] (Section 4.2), time series classification on the UEA benchmark [5] (Section 4.3), and reinforcement learning on the D4RL Benchmark [29] (Section 4.4), and the machine translation on the IWSLT 14 De-En [10] (Section 4.5). |
| Dataset Splits | Yes | We report the validation and test perplexity (PPL) of Fourier Former versus the baseline transformer with the dot-product attention in Table 1. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments were provided in the paper. |
| Software Dependencies | No | The paper mentions 'Pytorch' but does not specify its version number or the version of CUDA used. |
| Experiment Setup | Yes | In all experiments, we made the constant R in Fourier attention (see equation (16)) to be a learnable scalar and set choose the function φ(x) = x4 (see Remark 2). |