Stochastic AUC Maximization with Deep Neural Networks
Authors: Mingrui Liu, Zhuoning Yuan, Yiming Ying, Tianbao Yang
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results demonstrate the effectiveness of the proposed algorithms. We evaluate the proposed algorithms on several large-scale benchmark datasets. The experimental results show that our algorithms have superior performance than other baselines. |
| Researcher Affiliation | Academia | Mingrui Liu Department of Computer Science The University of Iowa Iowa City, IA, 52242, USA mingrui-liu@uiowa.edu; Zhuoning Yuan Department of Computer Science The University of Iowa Iowa City, IA, 52242, USA zhuoning-yuan@uiowa.edu; Yiming Ying Department of Mathematics and Statistics SUNY at Albany Albany, NY, 12222, USA yying@albany.edu; Tianbao Yang Department of Computer Science The University of Iowa Iowa City, IA, 52242, USA tianbao-yang@uiowa.edu |
| Pseudocode | Yes | Algorithm 1 Proximally Guided Algorithm (PGA) (Rafique et al., 2018); Algorithm 2 Proximal Primal-Dual Stochastic Gradient (PPD-SG); Algorithm 3 Inner Loop of Proximal Primal-Dual Ada Grad (PPD-Ada Grad); Algorithm 4 Update T+, T , bp, p(1 p), y given data {zj, . . . , zj+m 1} |
| Open Source Code | No | The paper does not contain any statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We conduct the comparisons on four benchmark datasets, i.e., Cat&Dog (C2), CIFAR10 (C10), CIFAR100 (C100), STL10. Cat&Dog is from Kaggle containing 25,000 images of dogs and cats and we choose an 80:20 split to construct training and testing set. For CIFAR10/STL10, we label the first 5 classes as negative ("-") class and the last 5 classes as positive ("+") class. For CIFAR100, we label the first 50 classes as negative ("-") class and the last 50 classes as positve ("+") class. |
| Dataset Splits | Yes | We use 19k/1k, 45k/5k, 45k/5k, 4k/1k training/validation split on C2, C10, C100, and STL10 respectively. |
| Hardware Specification | No | The paper mentions using a "residual network with 20 layers (Res Net-20)" but does not specify any particular hardware details such as GPU models, CPU types, or memory specifications used for training or evaluation. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software components, libraries, or frameworks used in the experiments (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | We use the stagewise step size strategy as in (He et al., 2016) for SGD, i.e. the step size is decreased by 10 times at 40K, 60K. For PPD-SG and PPD-Ada Grad, we set Ts = T03k, ηs = η0/3k. T0, η0 are tuned on a validation data. The value of γ is tuned for PGA and the same value is used for PPD-SG and PPD-Ada Grad. The initial step size is tuned in [0.1, 0.05, 0.01, 0.008, 0.005] and T0 is tuned in [200 2000] for each algorithm separately. The batch size is set to 128. For STL10, we use a smaller batch size 32 due to the limited training data. |