TreeGen: A Tree-Based Transformer Architecture for Code Generation

Authors: Zeyu Sun, Qihao Zhu, Yingfei Xiong, Yican Sun, Lili Mou, Lu Zhang8984-8991

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated Tree Gen on a Python benchmark, Hearth Stone, and two semantic parsing benchmarks, ATIS and GEO. Tree Gen outperformed the previous state-of-the-art approach by 4.5 percentage points on Hearth Stone, and achieved the best accuracy among neural network-based approaches on ATIS (89.1%) and GEO (89.6%). We also conducted an ablation test to better understand each component of our model.
Researcher Affiliation Academia Zeyu Sun, Qihao Zhu, Yingfei Xiong, Yican Sun, Lili Mou, Lu Zhang Key Laboratory of High Confidence Software Technologies (Peking University), Mo E; Software Institute, Peking University, 100871, P. R. China {szy , zhuqh, xiongyf, sycpku, zhanglucs}@pku.edu.cn University of Alberta, Edmonton, AB, Canada doublepower.mou@gmail.com
Pseudocode No The paper describes the architecture and various components in detail but does not include any explicit pseudocode blocks or algorithms.
Open Source Code Yes The code is available at https://github.com/zysszy/Tree Gen
Open Datasets Yes We followed the train-dev-test split in Ling et al. (2016), and the statistic is listed in Table 2.
Dataset Splits Yes We followed the train-dev-test split in Ling et al. (2016), and the statistic is listed in Table 2.
Hardware Specification Yes It takes 18s for an epoch on a single Nvidia Titan XP
Software Dependencies No The paper mentions "Adafactor (Shazeer and Stern 2018)" as the optimizer but does not specify versions for other key software components or libraries (e.g., Python, TensorFlow, PyTorch).
Experiment Setup Yes For neural networks, we set the number of NL reader layers Nd = 6, and N1 = N2 = 5 for the AST reader as well as the decoder. The size of all embedding is 256. The hidden sizes were all set to the 256 except each fully-connected layers, except the first layer was 1024 dimensions. We applied dropout after each layer (including attention layers, gating mechanism layers, convolutional layers, and fully-connected layers, where the drop rate is 0.15). The model is optimized by Adafactor (Shazeer and Stern 2018) with default parameters.