reproducibilityindex.ai

Novel positional encodings to enable tree-based transformers

Authors: Vighnesh Shiv, Chris Quirk

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluated our model in tree-to-tree program translation and sequence-to-tree semantic parsing settings, achieving superior performance over both sequence-to-sequence transformers and state-of-the-art tree-based LSTMs on several datasets. In particular, our results include a 22% absolute increase in accuracy on a Java Script to Coffee Script translation dataset.
Researcher Affiliation	Industry	Vighnesh Leonardo Shiv Microsoft Research Redmond, WA vishiv@microsoft.com Chris Quirk Microsoft Research Redmond, WA chrisq@microsoft.com
Pseudocode	No	No structured pseudocode or algorithm blocks were found. The paper defines operations mathematically (e.g., 'Dix = en i ; x[: n] (2) Ux = x[n :]; 0n (3)') but does not present a full algorithm or pseudocode block.
Open Source Code	Yes	1Implemented in Microsoft ICECAPS: https://github.com/microsoft/icecaps
Open Datasets	Yes	The ﬁrst set of tasks is For2Lam, a synthetic translation dataset... More details about the data sets can be found at Chen et al. (2018). JOBS (Califf & Mooney, 1999), a job listing database retrieval task. GEO (Tang & Mooney, 2001), a geographical database retrieval task. ATIS (Dahl et al., 1994), a ﬂight booking task.
Dataset Splits	No	For the synthetic translation tasks... The dataset is split into two tasks: one for small programs and one for large programs... Each set of tasks contains 100,000 training examples and 10,000 test examples total. JOBS (Califf & Mooney, 1999)... 500 training examples and 140 evaluation examples. GEO (Tang & Mooney, 2001)... 680 training examples and 200 evaluation examples. ATIS (Dahl et al., 1994)... 4480 training examples and 450 evaluation examples. While training and test/evaluation set sizes are provided, there is no explicit mention of a separate validation dataset split or its size/methodology.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory, or specific cloud instances) used for running the experiments were provided. The paper only mentions 'For memory-related reasons, a batch size of 64 was used instead for the tasks with longer program lengths', which implies hardware constraints but does not specify the hardware itself.
Software Dependencies	No	No specific software dependencies with version numbers were provided. The paper only mentions that the model is 'Implemented in Microsoft ICECAPS' without further version details for ICECAPS or other libraries.
Experiment Setup	Yes	Unless listed otherwise, we performed all of our experiments with Adam (Kingma & Ba, 2015), a batch size of 128, a dropout rate of 0.1 (Srivastava et al., 2014), and gradient clipping for norms above 10.0. Both models were trained with four layers and dmodel = 256. The sequence-transformer was trained with dff = 1024 and a positional encoding dimension that matched dmodel, in line with the hyperparameters used in the original transformer. The tree-transformer, however, was given a larger positional encoding size of 2048 in exchange for a smaller dff of 512. For memory-related reasons, a batch size of 64 was used instead for the tasks with longer program lengths.