reproducibilityindex.ai

Compositional Training for End-to-End Deep AUC Maximization

Authors: Zhuoning Yuan, Zhishuai Guo, Nitesh Chawla, Tianbao Yang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive empirical studies on imbalanced benchmark and medical image datasets, which unanimously verify the effectiveness of the proposed method. Our results show that the compositional training approach dramatically improves both the feature representations and the testing AUC score compared with traditional deep learning approaches, and yields better performance than the two-stage approaches for DAM as well.
Researcher Affiliation	Academia	Zhuoning Yuan1, Zhishuai Guo1, Nitesh V. Chawla2, Tianbao Yang1 1Department of Computer Science, University of Iowa 2Computer Science & Engineering, University of Notre Dame {zhuoning-yuan, zhishuai-guo, tianbao-yang}@uiowa.edu, nchawla@nd.edu
Pseudocode	Yes	Algorithm 1 Primal-Dual Stochastic Compositional Adaptive (PDSCA) method for solving (6)
Open Source Code	Yes	The proposed method is implemented in our open-sourced library Lib AUC (www.libauc.org) and code is available at https://github.com/Optimization-AI/Lib AUC.
Open Datasets	Yes	We conduct experiments on four benchmark datasetes and four medical image datasets. The statistics of these datasets are included in the Appendix A.1. More training conﬁgurations can be found in Appendix A.2. ... CheXpert is a largescale chest X-ray dataset (Irvin et al., 2019)... The DDSM+ data is a combination of two datasets namely DDSM and CBIS-DDSM (Lee et al., 2017; Bowyer et al., 1996; Heath et al., 1998)...
Dataset Splits	Yes	For all datasets, we use the train/val split to do cross-validation for parameter tuning, except Che Xpert as explained below. For the benchmark datasets, we use 19k/1k, 45k/5k, 45k/5k. 4k/1k training/validation split on Catvs Dog, CIFAR10, CIFAR100, STL10, respectively. For melanoma dataset, we use 70/10/20 split for train/val/test.
Hardware Specification	Yes	All benchmark datasets are experimented by NVIDIA GTX-2080Ti and four medical datasets, i.e., Che Xpert, Melanoma, DDSM+ and Patch Cam, are experimented by NVIDIA V100.
Software Dependencies	No	The paper mentions using ResNet and DenseNet architectures, and refers to a 'open-sourced library Lib AUC' for implementation, but it does not specify version numbers for any software dependencies like programming languages (e.g., Python), deep learning frameworks (e.g., TensorFlow, PyTorch), or other libraries.
Experiment Setup	Yes	The weight decay is set to 1e-4 for all experiments. For algorithms to maximize AUC, we use a batch size = 128 and train a total of 100 epochs, and we use step size 0.1 and decrease it by 10 times at 50% and 75% of total training time. We tune the beta parameters of our method in a range [0.1, 0.99] with a grid search and ﬁnd that good values are around 0.9. For linear combination methods, we tune the weight c of two losses in {0.25, 0.5, 0.75}. We tune the number of inner gradient steps for CT in k {1, 2, 3} with α = 0.1. ... For medical datasets, we use batch size of 32 except for Patch Cam that is 64, initial learning rate of 0.1 and weight decay of 1e-5. We train Melanoma for 12 epochs, Che Xpert for 2 epochs, DDSM+ for 5 epochs and Patch Cam for 5 epochs. The learning rate is decayed at 50%, 75% of total training iterations by 10 times.