PLUR: A Unifying, Graph-Based View of Program Learning, Understanding, and Repair

Authors: Zimin Chen, Vincent J Hellendoorn, Pascal Lamblin, Petros Maniatis, Pierre-Antoine Manzagol, Daniel Tarlow, Subhodeep Moitra

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our main result is to show that 16 recently published tasks of different shapes can be cast in this form, based on which a single model architecture achieves near or above state-of-the-art results on nearly all tasks, outperforming custom models like code2seq and alternative generic models like Transformers. This unification further enables multitask learning and a series of cross-cutting experiments about the importance of different modeling choices for code understanding and repair tasks. In the experiments we ask the following research questions: RQ1: How does the general PLUR approach compare to approaches like GREAT [Hellendoorn et al., 2020], Hoppity [Dinella et al., 2020], and code2seq [Alon et al., 2018]? To evaluate this, we compare the PLUR family of models to the approaches and metrics used by the original papers across the 16 tasks.
Researcher Affiliation Collaboration Zimin Chen KTH Royal Institute of Technology Stockholm, Sweden zimin@kth.se Vincent J. Hellendoorn Carnegie Mellon University Pittsburgh, USA vhellend@cs.cmu.edu Petros Maniatis Google Research Mountain View, USA maniatis@google.com Pascal Lamblin, Pierre-Antoine Manzagol, Daniel Tarlow, Subhodeep Moitra Google Research Montreal, Canada lamblinp,manzagop,dtarlow,smoitra@google.com Work done during internship at Google. Work done during visiting-faculty appointment at Google.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The full framework, called PLUR, is easily extensible to more tasks, and will be open-sourced (https://github.com/google-research/plur). Open-source code for the full PLUR framework will be available under an Apache 2 license at https://github.com/google-research/plur.
Open Datasets Yes We now briefly describe how we brought 16 tasks and datasets into PLUR, introduced in 9 papers in the recent ML4Code literature and available under public-domain licenses. For example, Var Misuse H [Hellendoorn et al., 2020] and Hoppity [Dinella et al., 2020] are graph-based program repair tasks but with differing output representations. Cu BERT [Kanade et al., 2020a] released a benchmark of six tasks, for the purpose of evaluating BERT-style pre-trained code embeddings.
Dataset Splits No The paper mentions using 'validation data' and 'validation examples' for model selection and hyperparameter tuning, but it defers specific dataset sizes and split details to an appendix, which is not provided.
Hardware Specification Yes We trained each model variant (GREAT2TOCOPO, TRANSFORMER2TOCOPO, and GGNN2TOCOPO) on each of the tasks using 8-core TPU-v2s for acceleration.
Software Dependencies No The paper mentions using Transformer models and various frameworks like GREAT and TOCOPO, but it does not specify version numbers for any software, libraries, or dependencies.
Experiment Setup No The paper states, 'For each task and model variant, we perform a grid search over hyperparameters and minor implementation variations (see Appendix).' However, the specific hyperparameter values or detailed training configurations are deferred to the Appendix, which is not provided in the text.