reproducibilityindex.ai

RDPD: Rich Data Helps Poor Data via Imitation

Authors: Shenda Hong, Cao Xiao, Trong Nghia Hoang, Tengfei Ma, Hongyan Li, Jimeng Sun

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluated RDPD on three real-world datasets and shown that its distilled model consistently outperformed all baselines across all datasets, especially achieving the greatest performance improvement over a model trained only on low-quality data by 24.56% on PRAUC and 12.21% on ROC-AUC, and over that of a state-of-the-art KD model by 5.91% on PR-AUC and 4.44% on ROC-AUC.
Researcher Affiliation	Collaboration	Shenda Hong1,2,5 , Cao Xiao3 , Trong Nghia Hoang4 , Tengfei Ma4 , Hongyan Li1,2 and Jimeng Sun5 1School of Electronics Engineering and Computer Science, Peking University, China 2Key Laboratory of Machine Perception (Ministry of Education), Peking University, China 3Analytics Center of Excellence, IQVIA, USA 4IBM Research, USA 5Department of Computational Science and Engineering, Georgia Institute of Technology, USA
Pseudocode	Yes	Algorithm 1 RDPD (Xr, Xp, Y , T)
Open Source Code	Yes	Our code is publicly available at https://github.com/hsd1503/RDPD.
Open Datasets	Yes	PAMAP2 Physical Activity Monitoring Data Set (PAMAP2) [Reiss and Stricker, 2012] The PTB Diagnostic ECG Database (PTBDB) includes 15 channels of ECG signals collected from controls and patients of heart diseases [Bousseljot et al., 1995] The Medical Information Mart for Intensive Care (MIMIC-III) is collected on over 58, 000 ICU patients at the Beth Israel Deaconess Medical Center (BIDMC) from June 2001 to October 2012 [Johnson et al., 2016].
Dataset Splits	Yes	In our experiment, we choose data of subject 105 for validation, subject 101 for testing, and others for training. In our experiment, we random divided the data into training (80%), validation (10%) and test (10%) sets by subjects. In our experiment, we random divided the data into training (80%), validation (10%) and test (10%) sets by patients.
Hardware Specification	Yes	All models were implemented in Py Torch version 0.5.0., and trained with a system equipped with 64GB RAM, 12 Intel Core i7-6850K 3.60GHz CPUs and Nvidia Ge Force GTX 1080.
Software Dependencies	Yes	All models were implemented in Py Torch version 0.5.0.
Experiment Setup	Yes	Models are trained with the mini-batch of 128 samples for 200 iterations, which was a sufﬁcient number of iterations for achieving the best performance for the classiﬁcation task. All models were optimized using Adam [Kingma and Ba, 2014], with the learning rate set to 0.001. T is set to 5 for PAMAP2 and PTBDB, and set to 2.5 for MIMIC-III.