reproducibilityindex.ai

Associating Natural Language Comment and Source Code Entities

Authors: Sheena Panthaplackel, Milos Gligoric, Raymond J. Mooney, Junyi Jessy Li8592-8599

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that our systems outperform several baselines learning from the proposed supervision. ... The new task of associating entities in natural language comments with elements in source source code, with a manually labeled evaluation dataset for this task; ... Trained on noisy data, the two models outperform baselines by wide margins, with the binary classiﬁer attaining an F1 score of 0.677 and the CRF attaining an F1 score of 0.618, achieving 39.6% and 27.4% improvement from baselines, respectively. We demonstrate the value of noisy supervision by showing improved performances of our models as the size of noisy training data increases. Additionally, through an ablation study, we highlight the utility of the features that are consumed by our models.
Researcher Affiliation	Academia	Sheena Panthaplackel,1 Milos Gligoric,2 Raymond J. Mooney,1 Junyi Jessy Li3 1Department of Computer Science 2Department of Electrical and Computer Engineering 3Department of Linguistics The University of Texas at Austin {spantha, mooney}@cs.utexas.edu, gligoric@utexas.edu, jessy@austin.utexas.edu
Pseudocode	No	The paper describes the models (binary classifier, CRF) and their components but does not provide any formal pseudocode or algorithm blocks.
Open Source Code	Yes	The full dataset (including the annotated test set) and implementation are available at https://github.com/panthap2/Associating NLComment Code Entities.
Open Datasets	Yes	We construct a dataset by extracting examples from all commits of popular open-source projects on Git Hub. ... The full dataset (including the annotated test set) and implementation are available at https://github.com/panthap2/Associating NLComment Code Entities.
Dataset Splits	Yes	Upon ﬁltering, we partition our primary dataset into train, test, and validation sets, shown in Table 1. ... Table 1: Number of examples, total and unique candidate tokens, and average number of candidate tokens per example, for each partition of the dataset. ... Validation 77 2,488 911 32.3
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as CPU/GPU models, memory, or cloud instances.
Software Dependencies	No	The paper mentions software like spaCy, javalang library, difflib library, TensorFlow, and sklearncrfsuite, but it does not specify version numbers for any of these software dependencies.
Experiment Setup	Yes	The 4 fully-connected layers have 512, 384, 256, and 128 units. Dropout is applied to each of these with probability 0.2. We terminate training if there is no improvement in the F1 score on the validation set for 5 consecutive epochs (after 10 epochs), and we use the model corresponding to the highest validation F1 score up till that point.