Learning Structural Edits via Incremental Tree Transformations

Authors: Ziyu Yao, Frank F. Xu, Pengcheng Yin, Huan Sun, Graham Neubig

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our proposed editor on two source code edit datasets, where results show that, with the proposed edit encoder, our editor significantly improves accuracy over previous approaches that generate the edited program directly in one pass.
Researcher Affiliation Academia Ziyu Yao Frank F. Xu, Pengcheng Yin The Ohio State University Carnegie Mellon University yao.470@osu.edu {fangzhex,pcyin}@cs.cmu.edu Huan Sun Graham Neubig The Ohio State University Carnegie Mellon University sun.397@osu.edu gneubig@cs.cmu.edu
Pseudocode Yes Algorithm 1 DAGGERSAMPLING...Algorithm 2 POSTREFINESAMPLING...Algorithm 3 TREESHORTESTDIST
Open Source Code Yes Our source code is available at https://github.com/neulab/incremental_tree_edit.
Open Datasets Yes We test our methods on two source code edit datasets introduced by Yin et al. (2019), also largely following their experimental setting.
Dataset Splits Yes The Git Hub Edits (GHE) dataset contains C , C+ pairs and their surrounding context collected from the commit logs of 54 Git Hub C# projects. The dataset is split into train/dev/test sets of 91,372 / 10,176 / 10,176 samples.
Hardware Specification No The paper describes the computational models and experimental setups in terms of software and data but does not provide specific details about the hardware used for running the experiments (e.g., CPU, GPU models, or cloud computing instances).
Software Dependencies No The paper mentions various software components and frameworks used (e.g., LSTM, GGNN, ASDL), but it does not specify exact version numbers for these or other software dependencies.
Experiment Setup Yes For the encoder of our neural editor, the dimension of word embedding and the tree node representation is set to 128. The dimension of the bidirectional LSTM encoder for encoding input code tokens and contexts is set to 64. The hidden state for tracking tree history is set to 256 dimensions. In the decoder side, the dimensions of the operator embedding, the field embedding, the production rule embedding, and the hidden vector in value prediction are set to 32, 32, 128 and 256, respectively. ... we train our Graph2Edit for 30 epochs on Git Hub Edits training set...