reproducibilityindex.ai

Summarizing Source Code with Transferred API Knowledge

Authors: Xing Hu, Ge Li, Xin Xia, David Lo, Shuai Lu, Zhi Jin

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on large-scale real-world industry Java projects indicate that our approach is effective and outperforms the state-of-the-art in code summarization.
Researcher Affiliation	Academia	Xing Hu1,2, Ge Li1,2 , Xin Xia3, David Lo4, Shuai Lu1,2 and Zhi Jin1,2 1 Key laboratory of High Conﬁdence Software Technologies (Peking University), Ministry of Education 2 Institute of Software, EECS, Peking University, Beijing, China 3 Faculty of Information Technology, Monash University, Australia 4 School of Information Systems, Singapore Management University, Singapore
Pseudocode	No	The paper includes figures describing model architectures and equations but no structured pseudocode or algorithm blocks.
Open Source Code	Yes	The data and code are available at https://github.com/xing-hu/TL-Code Sum
Open Datasets	Yes	There are two datasets used in our work...The API sequence summarization dataset contains Java projects from 2009 to 2014...The Java projects used in code summarization task are created from 2015 to 2016...The data and code are available at https://github.com/xing-hu/TL-Code Sum
Dataset Splits	Yes	We split each dataset into training, valid and testing sets in proportion with 8 : 1 : 1 after shufﬂing the pairs.
Hardware Specification	No	The paper states, 'We use the Tensorﬂow to train our models on GPUs.' but does not specify any exact GPU models, CPU types, or other detailed hardware specifications.
Software Dependencies	No	The paper mentions using 'Tensorﬂow' and 'Eclipse s JDT compiler' but does not provide specific version numbers for any software components, which is required for reproducibility.
Experiment Setup	Yes	We set the dimensionality of the GRU hidden states, token embeddings, and summary embeddings to 128. The model is trained using the mini-batch stochastic gradient descent algorithm (SGD) and the batch size is set as 32. The maximum lengths of source code and API sequences are 300 and 20. For decoding, we set the beam size to 5 and the maximum summary length to 30 words.