Pyramid Attention For Source Code Summarization

Authors: Lei Chai, Ming LI

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated it on two source code summarization benchmarks where it surpasses the prior works and achieves new state-of-the-art results. And ablation studies are conducted to show the efficiency of the proposed method.
Researcher Affiliation Academia Lei Chai and Ming Li National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China {chail, lim}@lamda.nju.edu.cn
Pseudocode No The paper describes its methods in text and uses diagrams, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code Yes Our code and data are available at https://github.com/leichainju/pa-former.
Open Datasets Yes To demonstrate the effectiveness of the proposed method, we conduct experiments on two widely-used and well-developed java datasets: EMSE-Deep Com2 [11] which is collected from Git Hubs Java repositories and Fun Com3 [14] which has 2 million java methodcomment pairs. 2https://github.com/xing-hu/EMSE-Deep Com 3http://leclair.tech/data/funcom/
Dataset Splits No Table 1 provides '#train' and '#test' statistics for the datasets, but there is no explicit mention of a 'validation' dataset split with specific numbers or percentages.
Hardware Specification Yes All models are trained using NVIDIA Tesla A100 GPUs with a batch size of 64.
Software Dependencies No The paper mentions using the 'Tree-sitter' tool and a 'Py Torch-based' framework, but it does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes For fair comparisons, all the Transformer-based models use the default Transformer configurations with embedding dimension as 512, feedforward dimension as 2048, head number as 8, and layer number for encoder/decoder as 6 and all RNN-based models use the hidden dimension with 512... All models are trained using NVIDIA Tesla A100 GPUs with a batch size of 64. We train all baselines including our models using Adam W optimizer with a multi_step learning rate scheduler, and set the initial learning rate to 0.0002 and 0.003 for Transformer-based and RNNbased models, respectively.