Learning to Encode Position for Transformer with Continuous Dynamical Model
Authors: Xuanqing Liu, Hsiang-Fu Yu, Inderjit Dhillon, Cho-Jui Hsieh
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our new position layers on a variety of neural machine translation and language understanding tasks, the experimental results show consistent improvements over the baselines. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, University of California Los Angeles, CA, USA 2Amazon.com 3Department of Computer Science, University of Texas Austin, TA, USA. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | Our experimental codes will be made publicly available. |
| Open Datasets | Yes | We include the following three additive position encoders: ... Data-driven FLOATER: ... Pre-defined sinusoidal position encoder: ... Length-fixed position embedding: ... BLEU scores on WMT14 Ee De and En-Fr datasets with both Transformer-base and Transformer-large models described in (Vaswani et al., 2017). ... GLUE (Wang et al., 2018), RACE (Lai et al., 2017) and SQu AD (Rajpurkar et al., 2016). |
| Dataset Splits | Yes | Single models on dev, w/o data augmentation |
| Hardware Specification | No | The paper mentions 'better parallelization using modern hardware' but does not provide specific details about the GPU or CPU models, memory, or cloud resources used for experiments. |
| Software Dependencies | No | All our codes to perform experiments in this paper are based on the Transformer implementations in the fairseq (Ott et al., 2019) package. No specific version number for fairseq is provided. |
| Experiment Setup | Yes | In this paper, we download the same pre-trained Ro BERTa model from the official repository as our pretrained Transformer model for all NLP tasks discussed in this section. ... We keep the hyperparameters, such as batch size and learning rate, to also be the same. |