Contextual Stochastic Bilevel Optimization
Authors: Yifan Hu, Jie Wang, Yao Xie, Andreas Krause, Daniel Kuhn
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments further validate our theoretical results. sections like 4 Applications and Numerical Experiments. Figures 1, 2, and 3 show Test error and Logistic Loss Value which are empirical results. |
| Researcher Affiliation | Academia | Yifan Hu EPFL & ETH Zürich Switzerland Jie Wang Gatech United States Yao Xie Gatech United States Andreas Krause ETH Zürich Switzerland Daniel Kuhn EPFL Switzerland |
| Pseudocode | Yes | Algorithm 1 Epoch SGD, Algorithm 2 SGD Framework, Algorithm 3 RT-MLMC Gradient Estimator for Conditional Bilevel Optimization, Algorithm 4 Hessian Vector Implementation for Computing bc(x, y; ξ). |
| Open Source Code | No | The paper does not contain any explicit statement or link indicating that the source code for the methodology is openly available. |
| Open Datasets | Yes | The experiment is examined on tiny Image Net [Mnmoustafa, 2017] by pre-processing it using the pre-trained Res Net-18 network [He et al., 2016] to extract linear features. Since the network has learned a rich set of hierarchical features from the Image Net dataset [Deng et al., 2009], it typically extracts useful features for other image datasets. |
| Dataset Splits | No | The paper mentions '90% of the images from each class are assigned to the training set, while the remaining 10% belong to the testing set.' for meta-learning, but it does not specify a distinct validation set or its split proportion. |
| Hardware Specification | No | The paper mentions training models and evaluating computation time but does not specify any particular hardware used for running the experiments (e.g., specific GPU or CPU models, memory). |
| Software Dependencies | No | The paper mentions 'pytorch' once in the context of initialization ('torch.nn.init.uniform_ in pytorch') but does not specify its version or any other software dependencies with version numbers. |
| Experiment Setup | Yes | We fine-tune the stepsize for all approaches using the following strategy: we pick a fixed breakpoint denoted as t0 and adjust the stepsize accordingly. Specifically, for the t-th outer iteration when t t0, the stepsize is set as 1/t, while for iterations beyond t0, the stepsize is set as 1/t. ... we set the hyper-parameter λ = 2. ... we specify the hyper-parameter β = 5 to balance the trade-off between loss function approximation and smoothness. ... The hyper-parameter λ for WDRO-SI or WDRO formulation has been fine-tuned via grid search from the set {1, 10, 50, 100, 150} for optimal performance. ... Here we take the number of gradient updates at inner level m from {1, 4, 8, 12}. |