Code Completion by Modeling Flattened Abstract Syntax Trees as Graphs

Authors: Yanlin Wang, Hui Li14015-14023

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on benchmark data for evaluating code completion. Results show that CCAG has superior performance than state-of-the-art approaches and it is able to provide intelligent code completion.
Researcher Affiliation Collaboration Yanlin Wang,1 Hui Li2 1 Microsoft Research Asia 2 School of Informatics, Xiamen University yanlwang@microsoft.com, hui@xmu.edu.cn
Pseudocode No The paper describes algorithms using text and mathematical equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide a specific repository link or an explicit statement about the release of its own source code for the methodology described.
Open Datasets Yes We choose two benchmark datasets2 Java Script (JS) and Python (PY) used in previous studies (Li et al. 2018; Liu et al. 2020). 2https://www.sri.inf.ethz.ch/research/plml
Dataset Splits No The paper states 'We use the official train/test split, i.e., 100,000 in the training set and 50,000 in the test set.' but does not explicitly provide details for a separate validation split or how it's derived if used.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions 'Adam' as the optimizer but does not provide specific version numbers for any software components, libraries, or programming languages used in the implementation.
Experiment Setup Yes For a fair comparison, we use 128 as the embedding size, hidden size and batch size for all methods. All methods are optimized with Adam (Kingma and Ba 2015) using an initial learning rate of 0.001. For Vanilla LSTM, Parent LSTM and Pointer Mixture Net, the learning rate is multiplied by 0.6 after each epoch, the gradient norm is clipped to 5, and the size of context windows is set to 50 as suggested by Li et al. (2018). For Transformer based methods, we search the settings of heads and layer number in 1-6 and 1-8, respectively. Then, their best results are reported. By default, we use 2 ASTGabs and 4 heads in CCAG and its variants, but we also report the impacts of varying these hyper-parameters in Sec.4.2.