Summarizing Source Code with Transferred API Knowledge

Authors: Xing Hu, Ge Li, Xin Xia, David Lo, Shuai Lu, Zhi Jin

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on large-scale real-world industry Java projects indicate that our approach is effective and outperforms the state-of-the-art in code summarization.
Researcher Affiliation Academia Xing Hu1,2, Ge Li1,2 , Xin Xia3, David Lo4, Shuai Lu1,2 and Zhi Jin1,2 1 Key laboratory of High Confidence Software Technologies (Peking University), Ministry of Education 2 Institute of Software, EECS, Peking University, Beijing, China 3 Faculty of Information Technology, Monash University, Australia 4 School of Information Systems, Singapore Management University, Singapore
Pseudocode No The paper includes figures describing model architectures and equations but no structured pseudocode or algorithm blocks.
Open Source Code Yes The data and code are available at https://github.com/xing-hu/TL-Code Sum
Open Datasets Yes There are two datasets used in our work...The API sequence summarization dataset contains Java projects from 2009 to 2014...The Java projects used in code summarization task are created from 2015 to 2016...The data and code are available at https://github.com/xing-hu/TL-Code Sum
Dataset Splits Yes We split each dataset into training, valid and testing sets in proportion with 8 : 1 : 1 after shuffling the pairs.
Hardware Specification No The paper states, 'We use the Tensorflow to train our models on GPUs.' but does not specify any exact GPU models, CPU types, or other detailed hardware specifications.
Software Dependencies No The paper mentions using 'Tensorflow' and 'Eclipse s JDT compiler' but does not provide specific version numbers for any software components, which is required for reproducibility.
Experiment Setup Yes We set the dimensionality of the GRU hidden states, token embeddings, and summary embeddings to 128. The model is trained using the mini-batch stochastic gradient descent algorithm (SGD) and the batch size is set as 32. The maximum lengths of source code and API sequences are 300 and 20. For decoding, we set the beam size to 5 and the maximum summary length to 30 words.