Global Relational Models of Source Code
Authors: Vincent J. Hellendoorn, Charles Sutton, Rishabh Singh, Petros Maniatis, David Bieber
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | By studying a popular, non-trivial program repair task, variable-misuse identification, we explore the relative merits of traditional and hybrid model families for code representation. Starting with a graph-based model that already improves upon the prior state-of-the-art for this task by 20%, we show that our proposed hybrid models improve an additional 10 15%, while training both faster and using fewer parameters. |
| Researcher Affiliation | Industry | Vincent J. Hellendoorn, Petros Maniatis, Rishabh Singh, Charles Sutton, David Bieber Google Research {vhellendoorn,maniatis,rising,charlessutton,dbieber}@google.com |
| Pseudocode | No | The paper describes model architectures and mathematical formulations (e.g., formulas for GGNN and Transformer attention) but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release a public implementation of the GREAT model based on Tensorflow, as well as the program graphs for all samples in our training and evaluation datasets whose license permits us to redistribute these at: https://doi.org/10.5281/zenodo.3668323, which tracks the latest release of our Github repository at: https://github.com/VHellendoorn/ ICLR20-Great. |
| Open Datasets | Yes | Synthetic Dataset We used the ETH Py150 dataset (Raychev et al., 2016), which is based on Git Hub Python code, and already partitioned into train and test splits (100K and 50K files, respectively). |
| Dataset Splits | Yes | We further split the 100K train files into 90K train and 10K validation examples and applied a deduplication step on that dataset (Allamanis, 2018). [...] yielding ca. 2M total training and 755K test samples. |
| Hardware Specification | Yes | Hardware: all our models were trained on a single Tesla P100 GPU on 25 million samples, which required between 40 and 250 hours for our various models. |
| Software Dependencies | No | The paper mentions software like 'Tensorflow' and 'Tensor2Tensor (Vaswani et al., 2018)' but does not provide specific version numbers for these or other libraries used for implementation. |
| Experiment Setup | Yes | We train most of our models with batch sizes of {12.5K, 25K, 50K} tokens, with the exception of the Transformer architectures; due to the quadratic nature of the attention computation, 25K tokens was too large for these models, so we additionally trained these with 6.25K-token batches. Learning rates were varied in {1e-3, 4-e4, 1e-4, 4e-5, 1e-5} using an Adam optimizer, where we omitted the first option for our GGNN models and the last for our RNNs due to poor performance. Sub-tokens were embedded using 128-dimensional embeddings. |