reproducibilityindex.ai

Retrieval-Augmented Generation for Code Summarization via Hybrid GNN

Authors: Shangqing Liu, Yu Chen, Xiaofei Xie, Jing Kai Siow, Yang Liu

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the proposed approach, we release a new challenging benchmark, crawled from diversiﬁed large-scale open-source C projects (total 95k+ unique functions in the dataset). Our method achieves the state-of-the-art performance, improving existing methods by 1.42, 2.44 and 1.29 in terms of BLEU-4, ROUGE-L and METEOR.
Researcher Affiliation	Academia	Shangqing Liu1 , Yu Chen2 , Xiaofei Xie1 , Jingkai Siow1, Yang Liu1 1 Nanyang Technology University 2 Rensselaer Polytechnic Institute
Pseudocode	No	The paper does not contain a dedicated pseudocode block or algorithm section.
Open Source Code	Yes	We also release a new code summarization benchmark by crawling data from popular and diversiﬁed projects containing 95k+ functions in C programming language and make it public 1. https://github.com/shangqing-liu/CCSD-benchmark-for-code-summarization
Open Datasets	Yes	We are the ﬁrst to explore neural summarization on C programming language, and make our C Code Summarization Dataset (CCSD) public to beneﬁt academia and industry.
Dataset Splits	Yes	Finally, we obtain 84,316 training functions, 4,432 in-domain validation functions, 4,203 in-domain test functions and 2,330 out-of-domain test functions.
Hardware Specification	Yes	All experiments are conducted on the DGX server with four Nvidia Graphics Tesla V100 and each epoch takes 6 minutes averagely.
Software Dependencies	No	The paper mentions models like Bi LSTM, GRU, and LSTM, but does not provide specific version numbers for software libraries or dependencies used (e.g., PyTorch version, Python version).
Experiment Setup	Yes	We embed the most frequent 40,000 words in the training set with 512-dims and set the hidden size of Bi LSTM to 256 and the concatenated state size for both directions is 512. The dropout is set to 0.3 after the word embedding layer and Bi LSTM. We set GNN hops to 1 for the best performance. The optimizer is selected with Adam with an initial learning rate of 0.001. The batch size is set to 64 and early stop for 10. The beam search width is set to 5 as usual.