Learnable Fourier Features for Multi-dimensional Spatial Positional Encoding

Authors: Yang Li, Si Si, Gang Li, Cho-Jui Hsieh, Samy Bengio

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments based on several public benchmark tasks show that our learnable Fourier feature representation for multi-dimensional positional encoding outperforms existing methods by both improving the accuracy and allowing faster convergence.We evaluate our approach on a range of benchmark tasks using Transformer-based models in comparison with several existing positional encoding methods.
Researcher Affiliation Collaboration Yang Li Google Research Mountain View, CA liyang@google.com Si Si Google Research Mountain View, CA sisidaisy@google.com Gang Li Google Research Mountain View, CA leebird@google.com Cho-Jui Hsieh UCLA Los Angeles, CA chohsieh@cs.ucla.edu Samy Bengio Google Research Mountain View, CA bengio@gmail.com
Pseudocode Yes Algorithm 1: Compute the Fourier feature positional encoding of a multi-dimensional position. Input: A tensor X in the shape of [N, G, M] that represents N positions where each position is in the shape of [G, M] that represents G positional groups and each group has M-dimensional positional values. Output: PEX in the shape of [N, D] where D is the depth of the positional encoding. Hyperparameter: The depth of the Fourier feature dimension |F|, the hidden layer dimension |H|, and the positional encoding dimension D, and γ. Initialization: Initialize learnable weights Wr R |F | 2 M by sampling from N(0, γ 2); Initialize learnable weights W1 R|F | |H|, B1 R|H|, W2 R|H| D G and B2 R D G . |F |[cos XW T r ; sin XW T r ] (Eq. 2); 2 Y Ge LU(FW1 + B1)W2 + B2 (Eq. 6) ; 3 PEX Reshape Y into the shape of [N, D]; 4 return PEX.
Open Source Code No The paper discusses implementing models based on existing public codebases (e.g., Trax2 for Reformer, DETR codebase, public codebase of widget captioning) but does not provide a specific link or statement about releasing the source code for their proposed Learnable Fourier Features for this paper's methodology.
Open Datasets Yes Image Net 64x64 dataset [4]
Dataset Splits Yes on the COCO 2017 object detection dataset [22] that has 118k images for training and 5k for validation.
Hardware Specification Yes The training for each Reformer model is parallelized across 32 TPU v2 cores
Software Dependencies No The paper mentions 'Trax2' and 'JAX' as software used for implementation but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes All our experiments used a 6-layer, 8-head-attention Reformer, with dmodel = 1024, dff = 4096, and nheads = 8.