A Framework for Bilevel Optimization on Riemannian Manifolds

Authors: Andi Han, Bamdev Mishra, Pratik Kumar Jawanpuria, Akiko Takeda

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The efficacy of the proposed framework is demonstrated through several applications.4 Experiments This section explores various applications of bilevel optimization problems on manifolds. All the experiments are implemented based on Geoopt [44] and the codes are available at https: //github.com/andyjm3/rhgd.
Researcher Affiliation Collaboration 1RIKEN AIP 2Microsoft, India 3University of Tokyo
Pseudocode Yes Algorithm 1 Riemannian hypergradient descent (RHGD) and Algorithm 2 Riemannian stochastic bilevel optimization with Hessian inverse.
Open Source Code Yes All the experiments are implemented based on Geoopt [44] and the codes are available at https: //github.com/andyjm3/rhgd.
Open Datasets Yes We consider 5-ways 5-shots meta learning over the Mini Image Net dataset [59]. We consider the Caltech-Office dataset [20]. ETH-80 image set [46].
Dataset Splits Yes In particular, we partition the set into a training set Dtr and validation set Dval. Here we sample 5 samples from each class to form the training set and the rest as the validation set.
Hardware Specification Yes All the experiments are conducted on a single NVIDIA RTX 4060 GPU.
Software Dependencies No All the experiments are implemented based on Geoopt [44]. This mentions a software package, but no version number. No other specific software with version numbers are mentioned.
Experiment Setup Yes We set ν = 0.01 and fix ηx = ηy = 0.5. We compare the three proposed strategies for approximating the hypergradient where we select γ = 1.0 and Tns = 50 for Neumann series (NS) and set maximum iterations Tcg for conjugate gradient (CG) to be 50 and break once the residual reaches a tolerance of 10 10. We set the number of outer iterations (epochs) K to be 200. Figure 1 compares RHGD with different approximation strategies implemented with S = 20 or 50 number of inner iterations.