A Framework for Bilevel Optimization on Riemannian Manifolds
Authors: Andi Han, Bamdev Mishra, Pratik Kumar Jawanpuria, Akiko Takeda
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The efficacy of the proposed framework is demonstrated through several applications.4 Experiments This section explores various applications of bilevel optimization problems on manifolds. All the experiments are implemented based on Geoopt [44] and the codes are available at https: //github.com/andyjm3/rhgd. |
| Researcher Affiliation | Collaboration | 1RIKEN AIP 2Microsoft, India 3University of Tokyo |
| Pseudocode | Yes | Algorithm 1 Riemannian hypergradient descent (RHGD) and Algorithm 2 Riemannian stochastic bilevel optimization with Hessian inverse. |
| Open Source Code | Yes | All the experiments are implemented based on Geoopt [44] and the codes are available at https: //github.com/andyjm3/rhgd. |
| Open Datasets | Yes | We consider 5-ways 5-shots meta learning over the Mini Image Net dataset [59]. We consider the Caltech-Office dataset [20]. ETH-80 image set [46]. |
| Dataset Splits | Yes | In particular, we partition the set into a training set Dtr and validation set Dval. Here we sample 5 samples from each class to form the training set and the rest as the validation set. |
| Hardware Specification | Yes | All the experiments are conducted on a single NVIDIA RTX 4060 GPU. |
| Software Dependencies | No | All the experiments are implemented based on Geoopt [44]. This mentions a software package, but no version number. No other specific software with version numbers are mentioned. |
| Experiment Setup | Yes | We set ν = 0.01 and fix ηx = ηy = 0.5. We compare the three proposed strategies for approximating the hypergradient where we select γ = 1.0 and Tns = 50 for Neumann series (NS) and set maximum iterations Tcg for conjugate gradient (CG) to be 50 and break once the residual reaches a tolerance of 10 10. We set the number of outer iterations (epochs) K to be 200. Figure 1 compares RHGD with different approximation strategies implemented with S = 20 or 50 number of inner iterations. |