Learning to Encode Position for Transformer with Continuous Dynamical Model

Authors: Xuanqing Liu, Hsiang-Fu Yu, Inderjit Dhillon, Cho-Jui Hsieh

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our new position layers on a variety of neural machine translation and language understanding tasks, the experimental results show consistent improvements over the baselines.
Researcher Affiliation Collaboration 1Department of Computer Science, University of California Los Angeles, CA, USA 2Amazon.com 3Department of Computer Science, University of Texas Austin, TA, USA.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No Our experimental codes will be made publicly available.
Open Datasets Yes We include the following three additive position encoders: ... Data-driven FLOATER: ... Pre-defined sinusoidal position encoder: ... Length-fixed position embedding: ... BLEU scores on WMT14 Ee De and En-Fr datasets with both Transformer-base and Transformer-large models described in (Vaswani et al., 2017). ... GLUE (Wang et al., 2018), RACE (Lai et al., 2017) and SQu AD (Rajpurkar et al., 2016).
Dataset Splits Yes Single models on dev, w/o data augmentation
Hardware Specification No The paper mentions 'better parallelization using modern hardware' but does not provide specific details about the GPU or CPU models, memory, or cloud resources used for experiments.
Software Dependencies No All our codes to perform experiments in this paper are based on the Transformer implementations in the fairseq (Ott et al., 2019) package. No specific version number for fairseq is provided.
Experiment Setup Yes In this paper, we download the same pre-trained Ro BERTa model from the official repository as our pretrained Transformer model for all NLP tasks discussed in this section. ... We keep the hyperparameters, such as batch size and learning rate, to also be the same.