reproducibilityindex.ai

Global Relational Models of Source Code

Authors: Vincent J. Hellendoorn, Charles Sutton, Rishabh Singh, Petros Maniatis, David Bieber

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	By studying a popular, non-trivial program repair task, variable-misuse identiﬁcation, we explore the relative merits of traditional and hybrid model families for code representation. Starting with a graph-based model that already improves upon the prior state-of-the-art for this task by 20%, we show that our proposed hybrid models improve an additional 10 15%, while training both faster and using fewer parameters.
Researcher Affiliation	Industry	Vincent J. Hellendoorn, Petros Maniatis, Rishabh Singh, Charles Sutton, David Bieber Google Research {vhellendoorn,maniatis,rising,charlessutton,dbieber}@google.com
Pseudocode	No	The paper describes model architectures and mathematical formulations (e.g., formulas for GGNN and Transformer attention) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	We release a public implementation of the GREAT model based on Tensorﬂow, as well as the program graphs for all samples in our training and evaluation datasets whose license permits us to redistribute these at: https://doi.org/10.5281/zenodo.3668323, which tracks the latest release of our Github repository at: https://github.com/VHellendoorn/ ICLR20-Great.
Open Datasets	Yes	Synthetic Dataset We used the ETH Py150 dataset (Raychev et al., 2016), which is based on Git Hub Python code, and already partitioned into train and test splits (100K and 50K ﬁles, respectively).
Dataset Splits	Yes	We further split the 100K train ﬁles into 90K train and 10K validation examples and applied a deduplication step on that dataset (Allamanis, 2018). [...] yielding ca. 2M total training and 755K test samples.
Hardware Specification	Yes	Hardware: all our models were trained on a single Tesla P100 GPU on 25 million samples, which required between 40 and 250 hours for our various models.
Software Dependencies	No	The paper mentions software like 'Tensorﬂow' and 'Tensor2Tensor (Vaswani et al., 2018)' but does not provide specific version numbers for these or other libraries used for implementation.
Experiment Setup	Yes	We train most of our models with batch sizes of {12.5K, 25K, 50K} tokens, with the exception of the Transformer architectures; due to the quadratic nature of the attention computation, 25K tokens was too large for these models, so we additionally trained these with 6.25K-token batches. Learning rates were varied in {1e-3, 4-e4, 1e-4, 4e-5, 1e-5} using an Adam optimizer, where we omitted the ﬁrst option for our GGNN models and the last for our RNNs due to poor performance. Sub-tokens were embedded using 128-dimensional embeddings.