Integrating Tree Path in Transformer for Code Representation
Authors: Han Peng, Ge Li, Wenhan Wang, YunFei Zhao, Zhi Jin
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments and ablation studies on code summarization across four different languages demonstrate the effectiveness of our approaches. |
| Researcher Affiliation | Academia | Han Peng1, Ge Li1 , Wenhan Wang1, Yunfei Zhao1, Zhi Jin1 1Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education; Institute of Software, EECS, Peking University, Beijing, China {phan, lige, wwhjacob, zhaoyunfei, zhijin}@pku.edu.cn |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks clearly labeled as 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | We release our code at https://github.com/Awd Han Peng/TPTrans. |
| Open Datasets | Yes | Datasets To show the effectiveness of our approaches across different source code languages, we experiment on four datasets introduced in the Code Search Net (CSN) Challenge [Husain et al., 2019]: Python, Javascript, Go, and Ruby. |
| Dataset Splits | Yes | Table 1: Dataset statistics Samples per partition Dataset Train Valid Test CSN-Python 412,178 23,107 22,176 CSN-Javascript 123,889 8,253 6,483 CSN-Ruby 48,791 2,209 2,279 CSN-Go 317,832 14,242 14,291 |
| Hardware Specification | Yes | In our experiment, we use 4 Tesla V100 GPUs for training. |
| Software Dependencies | No | The paper mentions optimizers (Adam) and neural network components (Bi-GRU), and a parser (Tree-Sitter), but does not provide specific version numbers for the programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or other key software libraries used for implementation. |
| Experiment Setup | Yes | Hyperparameters We denote the number of layers for encoder and decoder as LE and LD, the hidden size as D, the feed-forward dimension as DF F , and the number of heads as H. We primarily report results on two model size for both TPTrans and TPTrans-α: base (LD=1, DF F =2048) and large (LD=3, DF F =4096). For base and large settings, we all set LE=3, D=1024, and H=8. We add a pointer network [Vinyals et al., 2015] to the decoder as same as our baselines. The base model setting keeps the same size as other baselines for fair comparisons. The embedding dimension of the word and path node are 512 and 64, and we apply a linear layer to project word embedding to the hidden size of Transformer. We use one layer Bi-GRU, set the hidden size to 64, and concat the final states of forward and backward as output with the size of the single head dimension. We set the batch size and dropout rate to 128 and 0.2 and employ label smoothing of 0.1. All models and baselines are trained from random initial parameters. As the optimizer, we use Adam [Kingma and Ba, 2014] with a learning rate and weight decay of 1e 4. |