Stochastic AUC Maximization with Deep Neural Networks

Authors: Mingrui Liu, Zhuoning Yuan, Yiming Ying, Tianbao Yang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results demonstrate the effectiveness of the proposed algorithms. We evaluate the proposed algorithms on several large-scale benchmark datasets. The experimental results show that our algorithms have superior performance than other baselines.
Researcher Affiliation Academia Mingrui Liu Department of Computer Science The University of Iowa Iowa City, IA, 52242, USA mingrui-liu@uiowa.edu; Zhuoning Yuan Department of Computer Science The University of Iowa Iowa City, IA, 52242, USA zhuoning-yuan@uiowa.edu; Yiming Ying Department of Mathematics and Statistics SUNY at Albany Albany, NY, 12222, USA yying@albany.edu; Tianbao Yang Department of Computer Science The University of Iowa Iowa City, IA, 52242, USA tianbao-yang@uiowa.edu
Pseudocode Yes Algorithm 1 Proximally Guided Algorithm (PGA) (Rafique et al., 2018); Algorithm 2 Proximal Primal-Dual Stochastic Gradient (PPD-SG); Algorithm 3 Inner Loop of Proximal Primal-Dual Ada Grad (PPD-Ada Grad); Algorithm 4 Update T+, T , bp, p(1 p), y given data {zj, . . . , zj+m 1}
Open Source Code No The paper does not contain any statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We conduct the comparisons on four benchmark datasets, i.e., Cat&Dog (C2), CIFAR10 (C10), CIFAR100 (C100), STL10. Cat&Dog is from Kaggle containing 25,000 images of dogs and cats and we choose an 80:20 split to construct training and testing set. For CIFAR10/STL10, we label the first 5 classes as negative ("-") class and the last 5 classes as positive ("+") class. For CIFAR100, we label the first 50 classes as negative ("-") class and the last 50 classes as positve ("+") class.
Dataset Splits Yes We use 19k/1k, 45k/5k, 45k/5k, 4k/1k training/validation split on C2, C10, C100, and STL10 respectively.
Hardware Specification No The paper mentions using a "residual network with 20 layers (Res Net-20)" but does not specify any particular hardware details such as GPU models, CPU types, or memory specifications used for training or evaluation.
Software Dependencies No The paper does not provide specific version numbers for any software components, libraries, or frameworks used in the experiments (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup Yes We use the stagewise step size strategy as in (He et al., 2016) for SGD, i.e. the step size is decreased by 10 times at 40K, 60K. For PPD-SG and PPD-Ada Grad, we set Ts = T03k, ηs = η0/3k. T0, η0 are tuned on a validation data. The value of γ is tuned for PGA and the same value is used for PPD-SG and PPD-Ada Grad. The initial step size is tuned in [0.1, 0.05, 0.01, 0.008, 0.005] and T0 is tuned in [200 2000] for each algorithm separately. The batch size is set to 128. For STL10, we use a smaller batch size 32 due to the limited training data.