reproducibilityindex.ai

Min-Max Multi-objective Bilevel Optimization with Applications in Robust Machine Learning

Authors: Alex Gu, Songtao Lu, Parikshit Ram, Tsui-Wei Weng

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on robust representation learning and robust hyperparameter optimization showcase (i) the advantages of considering the min-max multi-objective setup, and (ii) convergence properties of the proposed MORBi T.
Researcher Affiliation	Collaboration	Alex Gu , Songtao Lu , Parikshit Ram , Tsui-Wei Weng* MIT CSAIL, IBM Research, UCSD gua@mit.edu, {songtao, parikshit.ram}@ibm.com, lweng@ucsd.edu
Pseudocode	Yes	Algorithm 1: MORBi T with learning rates α, β and γ for x, y, λ respectively
Open Source Code	Yes	Our code is at https://github.com/minimario/MORBi T.
Open Datasets	Yes	We first consider a multi-task setup with n = 10 binary classification tasks from the Fashion MNIST dataset (Xiao et al., 2017). ... We also consider a bilevel extension of the robust meta-learning application (Collins et al., 2020) for a sinusoid regression task, a common meta-learning application introduced by Finn et al. (2017) ... We generate n = 16 binary classification tasks from the Letter dataset (Frey & Slate, 1991)
Dataset Splits	Yes	each of the 16 learning tasks (and hence, objective pairs) has a training set size of around 900 samples (for the LL loss), with 300 samples each for the UL loss and for computing the generalization loss.
Hardware Specification	Yes	We perform our experiments in Python 3.7.10 and Py Torch 1.8.1 with Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz.
Software Dependencies	Yes	We perform our experiments in Python 3.7.10 and Py Torch 1.8.1
Experiment Setup	Yes	We use Py Torch (Paszke et al., 2019), and implementation details are in Appendix C. All results are aggregated over 10 trials. ... For the Task-Robust version of the algorithm, we use α = 0.007, β = 0.005, γ = 0.003. For the standard version of the algorithm, we use α = 0.007, β = 0.011, γ = 0.003. ... For our data, we had x R784 100 and y R100 2. We used step sizes α = 0.01, β = 0.01, and γ = 0.3. We used batch sizes of 8 and 128 to compute gi for each inner step and fi for each outer iteration, respectively. In addition, we included ℓ2-regularization of y with regularization penalty 0.0005. We used vanilla SGD with a learning rate scheduler (Reduce LROn Plateau), invoked every 100 outer iterations, with patience of 10. Each optimization was executed for 10000 outer iterations. ... In this application, we use learning rates α = 0.0001, β = 0.001, γ = 0.001 and 20000 outer iterations. We use a batch size of 8 for both the inner and outer steps for each i [16] for the initial experiment in figure 2a. The optimizer was vanilla SGD with a learning rate scheduler (Reduce LROn Plateau), invoked every 100 outer iterations, with patience of 30.