Deep Just-In-Time Inconsistency Detection Between Comments and Source Code
Authors: Sheena Panthaplackel, Junyi Jessy Li, Milos Gligoric, Raymond J. Mooney427-435
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | By evaluating on a large corpus of comment/code pairs spanning various comment types, we show that our model outperforms multiple baselines by significant margins. For extrinsic evaluation, we show the usefulness of our approach by combining it with a comment update model to build a more comprehensive automatic comment maintenance system which can both detect and resolve inconsistent comments based on code changes. |
| Researcher Affiliation | Academia | 1Department of Computer Science 2Department of Linguistics 3Department of Electrical and Computer Engineering The University of Texas at Austin spantha@cs.utexas.edu, jessy@austin.utexas.edu, gligoric@utexas.edu, mooney@cs.utexas.edu |
| Pseudocode | No | The paper describes the architecture and models in text and figures, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Data and implementation are available at https://github.com/panthap2/deep-jit-inconsistency-detection. |
| Open Datasets | Yes | For training and evaluation, we construct a large corpus of comments paired with code changes in the corresponding methods, encompassing multiple types of method comments and consisting of 40,688 examples that are extracted from 1,518 open-source Java projects.1 ... Data and implementation are available at https://github.com/panthap2/deep-jit-inconsistency-detection. |
| Dataset Splits | Yes | Statistics of our final dataset are shown in Table 1. ... Train Valid Test Total ... @return 15,950 1,790 1,840 19,580 ... Full 32,988 3,756 3,944 40,688 |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used for experiments, such as GPU/CPU models or memory specifications. |
| Software Dependencies | No | The paper mentions software tools and libraries such as javalang (with a footnote link to its PyPI page, not a version number), Bi GRU, GGNNs, Gum Tree, and Code BERT, but does not provide specific version numbers for these or other critical software dependencies. |
| Experiment Setup | Yes | We use 2-layer Bi GRU encoders (hidden dimension 64). GGNN encoders (hidden dimension 64) are rolled out for 8 message-passing steps, also use hidden dimension 64. We initialize comment and code embeddings, of dimension 64, with pretrained ones (Panthaplackel et al. 2020b). Edit embeddings are of dimension 8. Attention modules use 4 attention heads. We use a dropout rate of 0.6. Training ends if the validation F1 does not improve for 10 epochs. |