Global Relational Models of Source Code

Authors: Vincent J. Hellendoorn, Charles Sutton, Rishabh Singh, Petros Maniatis, David Bieber

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental By studying a popular, non-trivial program repair task, variable-misuse identification, we explore the relative merits of traditional and hybrid model families for code representation. Starting with a graph-based model that already improves upon the prior state-of-the-art for this task by 20%, we show that our proposed hybrid models improve an additional 10 15%, while training both faster and using fewer parameters.
Researcher Affiliation Industry Vincent J. Hellendoorn, Petros Maniatis, Rishabh Singh, Charles Sutton, David Bieber Google Research {vhellendoorn,maniatis,rising,charlessutton,dbieber}@google.com
Pseudocode No The paper describes model architectures and mathematical formulations (e.g., formulas for GGNN and Transformer attention) but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes We release a public implementation of the GREAT model based on Tensorflow, as well as the program graphs for all samples in our training and evaluation datasets whose license permits us to redistribute these at: https://doi.org/10.5281/zenodo.3668323, which tracks the latest release of our Github repository at: https://github.com/VHellendoorn/ ICLR20-Great.
Open Datasets Yes Synthetic Dataset We used the ETH Py150 dataset (Raychev et al., 2016), which is based on Git Hub Python code, and already partitioned into train and test splits (100K and 50K files, respectively).
Dataset Splits Yes We further split the 100K train files into 90K train and 10K validation examples and applied a deduplication step on that dataset (Allamanis, 2018). [...] yielding ca. 2M total training and 755K test samples.
Hardware Specification Yes Hardware: all our models were trained on a single Tesla P100 GPU on 25 million samples, which required between 40 and 250 hours for our various models.
Software Dependencies No The paper mentions software like 'Tensorflow' and 'Tensor2Tensor (Vaswani et al., 2018)' but does not provide specific version numbers for these or other libraries used for implementation.
Experiment Setup Yes We train most of our models with batch sizes of {12.5K, 25K, 50K} tokens, with the exception of the Transformer architectures; due to the quadratic nature of the attention computation, 25K tokens was too large for these models, so we additionally trained these with 6.25K-token batches. Learning rates were varied in {1e-3, 4-e4, 1e-4, 4e-5, 1e-5} using an Adam optimizer, where we omitted the first option for our GGNN models and the last for our RNNs due to poor performance. Sub-tokens were embedded using 128-dimensional embeddings.