Learnable Fourier Features for Multi-dimensional Spatial Positional Encoding
Authors: Yang Li, Si Si, Gang Li, Cho-Jui Hsieh, Samy Bengio
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments based on several public benchmark tasks show that our learnable Fourier feature representation for multi-dimensional positional encoding outperforms existing methods by both improving the accuracy and allowing faster convergence.We evaluate our approach on a range of benchmark tasks using Transformer-based models in comparison with several existing positional encoding methods. |
| Researcher Affiliation | Collaboration | Yang Li Google Research Mountain View, CA liyang@google.com Si Si Google Research Mountain View, CA sisidaisy@google.com Gang Li Google Research Mountain View, CA leebird@google.com Cho-Jui Hsieh UCLA Los Angeles, CA chohsieh@cs.ucla.edu Samy Bengio Google Research Mountain View, CA bengio@gmail.com |
| Pseudocode | Yes | Algorithm 1: Compute the Fourier feature positional encoding of a multi-dimensional position. Input: A tensor X in the shape of [N, G, M] that represents N positions where each position is in the shape of [G, M] that represents G positional groups and each group has M-dimensional positional values. Output: PEX in the shape of [N, D] where D is the depth of the positional encoding. Hyperparameter: The depth of the Fourier feature dimension |F|, the hidden layer dimension |H|, and the positional encoding dimension D, and γ. Initialization: Initialize learnable weights Wr R |F | 2 M by sampling from N(0, γ 2); Initialize learnable weights W1 R|F | |H|, B1 R|H|, W2 R|H| D G and B2 R D G . |F |[cos XW T r ; sin XW T r ] (Eq. 2); 2 Y Ge LU(FW1 + B1)W2 + B2 (Eq. 6) ; 3 PEX Reshape Y into the shape of [N, D]; 4 return PEX. |
| Open Source Code | No | The paper discusses implementing models based on existing public codebases (e.g., Trax2 for Reformer, DETR codebase, public codebase of widget captioning) but does not provide a specific link or statement about releasing the source code for their proposed Learnable Fourier Features for this paper's methodology. |
| Open Datasets | Yes | Image Net 64x64 dataset [4] |
| Dataset Splits | Yes | on the COCO 2017 object detection dataset [22] that has 118k images for training and 5k for validation. |
| Hardware Specification | Yes | The training for each Reformer model is parallelized across 32 TPU v2 cores |
| Software Dependencies | No | The paper mentions 'Trax2' and 'JAX' as software used for implementation but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | All our experiments used a 6-layer, 8-head-attention Reformer, with dmodel = 1024, dff = 4096, and nheads = 8. |