reproducibilityindex.ai

Integrating Tree Path in Transformer for Code Representation

Authors: Han Peng, Ge Li, Wenhan Wang, YunFei Zhao, Zhi Jin

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments and ablation studies on code summarization across four different languages demonstrate the effectiveness of our approaches.
Researcher Affiliation	Academia	Han Peng1, Ge Li1 , Wenhan Wang1, Yunfei Zhao1, Zhi Jin1 1Key Laboratory of High Conﬁdence Software Technologies (Peking University), Ministry of Education; Institute of Software, EECS, Peking University, Beijing, China {phan, lige, wwhjacob, zhaoyunfei, zhijin}@pku.edu.cn
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks clearly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	We release our code at https://github.com/Awd Han Peng/TPTrans.
Open Datasets	Yes	Datasets To show the effectiveness of our approaches across different source code languages, we experiment on four datasets introduced in the Code Search Net (CSN) Challenge [Husain et al., 2019]: Python, Javascript, Go, and Ruby.
Dataset Splits	Yes	Table 1: Dataset statistics Samples per partition Dataset Train Valid Test CSN-Python 412,178 23,107 22,176 CSN-Javascript 123,889 8,253 6,483 CSN-Ruby 48,791 2,209 2,279 CSN-Go 317,832 14,242 14,291
Hardware Specification	Yes	In our experiment, we use 4 Tesla V100 GPUs for training.
Software Dependencies	No	The paper mentions optimizers (Adam) and neural network components (Bi-GRU), and a parser (Tree-Sitter), but does not provide specific version numbers for the programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or other key software libraries used for implementation.
Experiment Setup	Yes	Hyperparameters We denote the number of layers for encoder and decoder as LE and LD, the hidden size as D, the feed-forward dimension as DF F , and the number of heads as H. We primarily report results on two model size for both TPTrans and TPTrans-α: base (LD=1, DF F =2048) and large (LD=3, DF F =4096). For base and large settings, we all set LE=3, D=1024, and H=8. We add a pointer network [Vinyals et al., 2015] to the decoder as same as our baselines. The base model setting keeps the same size as other baselines for fair comparisons. The embedding dimension of the word and path node are 512 and 64, and we apply a linear layer to project word embedding to the hidden size of Transformer. We use one layer Bi-GRU, set the hidden size to 64, and concat the final states of forward and backward as output with the size of the single head dimension. We set the batch size and dropout rate to 128 and 0.2 and employ label smoothing of 0.1. All models and baselines are trained from random initial parameters. As the optimizer, we use Adam [Kingma and Ba, 2014] with a learning rate and weight decay of 1e 4.