Commit Message Generation for Source Code Changes
Authors: Shengbin Xu, Yuan Yao, Feng Xu, Tianxiao Gu, Hanghang Tong, Jian Lu
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental evaluations on real data demonstrate that the proposed approach significantly outperforms the state-of-the-art in terms of accurately generating the commit messages. |
| Researcher Affiliation | Collaboration | Shengbin Xu1 , Yuan Yao1 , Feng Xu1 , Tianxiao Gu2 , Hanghang Tong3 and Jian Lu1 1State Key Laboratory for Novel Software Technology, Nanjing University, China 2Alibaba Group, USA 3Arizona State University, USA |
| Pseudocode | No | The paper describes the model architecture and components but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is publicly available at https://github.com/SoftWisergroup/CoDiSum. |
| Open Datasets | Yes | We use the commonly-used dataset in this area, which was collected by Jiang and Mc Millan [2017] from the top 1,000 popular Java projects in Github and contains 509k diff files and corresponding commit messages. |
| Dataset Splits | Yes | After pre-processing, we obtain 90, 661 pairs of diff, commit message and randomly choose 75, 000 for training, 8, 000 for validation, and 7, 661 for testing. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments, such as GPU/CPU models or types of machines. |
| Software Dependencies | No | The paper mentions optimizers and network architectures but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | For the encoder part, we set the maximum length of the structure sequence and the semantics sequence to 200 (i.e., N = 200) and 5 (i.e., N = 5), respectively. For the decoder part, the maximum length of commit message is set to 20. All the word embedding dimensionality is set to 150 (i.e., dx = dz = 150) with random initialization. For all multi-layer structure of RNNs, we set the layer number to 3. In the Bi-GRUs, the hidden state dimensionality of one direction is set to 128, and overall hidden state dimensionality is 256 (i.e., dh = 256). We also set the hidden state dimensionality of the decoder GRUs as 256. All the compared methods are set with the same parameters if applicable (e.g., the hidden state of NMT and Copy Net is set to 256). When training, we adopt the categorical crossentropy loss function and RMSProp optimizer with batch size 100 and dropout rate 0.1. We stop the training process when the loss is no longer decreasing. |