First-Order Minimax Bilevel Optimization
Authors: Yifan Yang, Zhaofeng Si, Siwei Lyu, Kaiyi Ji
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our methods in two applications: the recently proposed multi-task deep AUC maximization and a novel rank-based robust meta-learning. Our methods consistently improve over existing methods with better performance over various datasets. |
| Researcher Affiliation | Academia | Yifan Yang , Zhaofeng Si , Siwei Lyu and Kaiyi Ji Department of Computer Science and Engineering University at Buffalo Buffalo, NY 14260 {yyang99, zhaofeng, siweilyu, kaiyiji}@buffalo.edu |
| Pseudocode | Yes | Algorithm 1 Fully First-Order Single-Loop Method (FOSL) and Algorithm 2 Memory-Efficient Cold-Start (Mem CS) |
| Open Source Code | Yes | We provide the code as the supplementary material. |
| Open Datasets | Yes | CIFAR100 [29], Celeb A [40], Che Xpert [26] and OGBG-Mol PCBA [25] and Mini-Image Net [53] and Tiered-Image Net [47]. |
| Dataset Splits | Yes | The 100 classes are distributed among training, validation, and testing sets with a ratio of 64:16:20, respectively. |
| Hardware Specification | Yes | All experimental runs are performed using a single NVIDIA RTX 6000 GPU. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer', 'Res Net18 architecture', 'Dense Net121 model', 'Graph Isomorphism Network (GIN)', and 'CNN4', but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | Regarding hyperparameters, we set the total training epoch to 2000 for the CIFAR100 and 100 for the OGBG-Mol PCBA datasets, adjust it to 40 for Celeb A, and reduce it to 6 for Che Xpert. The learning rate for the optimal approximator v is uniformly set to ηv = 0.1 across all experiments, with ηw = ηu = ηv/λ to maintain gradient magnitude consistency between u and v. |