reproducibilityindex.ai

Deep Just-In-Time Inconsistency Detection Between Comments and Source Code

Authors: Sheena Panthaplackel, Junyi Jessy Li, Milos Gligoric, Raymond J. Mooney427-435

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	By evaluating on a large corpus of comment/code pairs spanning various comment types, we show that our model outperforms multiple baselines by signiﬁcant margins. For extrinsic evaluation, we show the usefulness of our approach by combining it with a comment update model to build a more comprehensive automatic comment maintenance system which can both detect and resolve inconsistent comments based on code changes.
Researcher Affiliation	Academia	1Department of Computer Science 2Department of Linguistics 3Department of Electrical and Computer Engineering The University of Texas at Austin spantha@cs.utexas.edu, jessy@austin.utexas.edu, gligoric@utexas.edu, mooney@cs.utexas.edu
Pseudocode	No	The paper describes the architecture and models in text and figures, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Data and implementation are available at https://github.com/panthap2/deep-jit-inconsistency-detection.
Open Datasets	Yes	For training and evaluation, we construct a large corpus of comments paired with code changes in the corresponding methods, encompassing multiple types of method comments and consisting of 40,688 examples that are extracted from 1,518 open-source Java projects.1 ... Data and implementation are available at https://github.com/panthap2/deep-jit-inconsistency-detection.
Dataset Splits	Yes	Statistics of our ﬁnal dataset are shown in Table 1. ... Train Valid Test Total ... @return 15,950 1,790 1,840 19,580 ... Full 32,988 3,756 3,944 40,688
Hardware Specification	No	The paper does not provide specific details regarding the hardware used for experiments, such as GPU/CPU models or memory specifications.
Software Dependencies	No	The paper mentions software tools and libraries such as javalang (with a footnote link to its PyPI page, not a version number), Bi GRU, GGNNs, Gum Tree, and Code BERT, but does not provide specific version numbers for these or other critical software dependencies.
Experiment Setup	Yes	We use 2-layer Bi GRU encoders (hidden dimension 64). GGNN encoders (hidden dimension 64) are rolled out for 8 message-passing steps, also use hidden dimension 64. We initialize comment and code embeddings, of dimension 64, with pretrained ones (Panthaplackel et al. 2020b). Edit embeddings are of dimension 8. Attention modules use 4 attention heads. We use a dropout rate of 0.6. Training ends if the validation F1 does not improve for 10 epochs.