Towards Balanced Defect Prediction with Better Information Propagation

Authors: Xianda Zheng, Yuan-Fang Li, Huan Gao, Yuncheng Hua, Guilin Qi759-767

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on real-world benchmark datasets show that DPCAG improves performance compare to the state-of-the-art models.
Researcher Affiliation Collaboration 1School of Cyber Science and Engineering, Southeast University, Nanjing, China 2Faculty of Information Technology, Monash University, Melbourne, Australia 3Microsoft Asia-Pacific Research and Development Group, Suzhou, China 4School of Computer Science and Engineering, Southeast University, Nanjing, China 5Key Laboratory of Computer Network and Information Integration, Southeast University, Nanjing, China
Pseudocode Yes Algorithm 1: training DPCAG classifier
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes We use three datasets from ELFF (Shippey et al. 2016), namely Dr Java, Genoviz and Jmol, to evaluate model performance.
Dataset Splits Yes For each dataset, the set of labeled nodes is divided into training, validation and test sets with the ratio of 90-5-5.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments (e.g., CPU, GPU models, or cloud computing specifications).
Software Dependencies No The paper mentions using RMSProp as the optimizer but does not specify its version or any other software dependencies (e.g., programming languages, libraries, or frameworks) with their version numbers.
Experiment Setup Yes For our model, the learning rate lr is set to 0.006 and the dropout rate p set to 0.5. The hyperparameter a is set to 25 for all three datasets. The hyperparameter b is set to 1300 in Dr Java and 2000 in both Genoviz and Jmol. The dimension of hidden layer is h = 20. We use RMSProp (Tieleman and Hinton 2012) as the optimizer. The confidence threshold for adding nodes to the training set is t = 0.8. In every iteration, the epoch for the Expectation step and the Maximization step is set to 200. We select the values of hyperparameters according to the result of validation set.