Towards Generating Summaries for Lexically Confusing Code through Code Erosion

Authors: Fan Yan, Ming Li

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our approach outperforms the state-of-the-art approaches to generate coherent and reliable summaries for various lexically confusing code. To evaluate the effectiveness of VECOS, we conduct experiments on a benchmark dataset and compare it with several state-of-the-art approaches for code summarization.
Researcher Affiliation Academia Fan Yan and Ming Li National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China {yanf, lim}@lamda.nju.edu.cn
Pseudocode No The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statement about open-sourcing code or a link to a code repository.
Open Datasets Yes We take the same dataset as [Hu et al., 2018], in which java code snippets are provided in functional granularity with corresponding summaries.
Dataset Splits Yes We randomly split the dataset as 8:1:1 to construct datasets for training, validating, and testing.
Hardware Specification Yes All the experiments are conducted with a machine equipped with one Intel Xeon E5-2620 CPU, 32GB RAM, a Nvidia Titan X graphic card with 12GB graphic storage. The operating system is Ubuntu 16.04.
Software Dependencies No The paper mentions "Ubuntu 16.04" as the operating system, but does not specify software dependencies like programming languages, libraries, or frameworks with version numbers.
Experiment Setup Yes For our generative summarization model ψ of VECOS, both the input size and hidden size of LSTM are set as 256. The size of the latent variable z is 100. Both the internal LSTM of the encoder and the sequential LSTM for the decoder has only one layer. The initial learning rate is 0.01, and it will decay on learning plateaus with a decaying speed of 0.5. We take the sigmoid annealing strategy to adjust the coefficient λkl dynamically from 1.0 to 0.01 [Bowman et al., 2015]. The weight decay factor for L2-regularization is set as 0.00001. The batch size for the training set is set as 100. The training process will stop when the performance of the validation set stops increasing for three epochs.