Contextual Stochastic Bilevel Optimization

Authors: Yifan Hu, Jie Wang, Yao Xie, Andreas Krause, Daniel Kuhn

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments further validate our theoretical results. sections like 4 Applications and Numerical Experiments. Figures 1, 2, and 3 show Test error and Logistic Loss Value which are empirical results.
Researcher Affiliation Academia Yifan Hu EPFL & ETH Zürich Switzerland Jie Wang Gatech United States Yao Xie Gatech United States Andreas Krause ETH Zürich Switzerland Daniel Kuhn EPFL Switzerland
Pseudocode Yes Algorithm 1 Epoch SGD, Algorithm 2 SGD Framework, Algorithm 3 RT-MLMC Gradient Estimator for Conditional Bilevel Optimization, Algorithm 4 Hessian Vector Implementation for Computing bc(x, y; ξ).
Open Source Code No The paper does not contain any explicit statement or link indicating that the source code for the methodology is openly available.
Open Datasets Yes The experiment is examined on tiny Image Net [Mnmoustafa, 2017] by pre-processing it using the pre-trained Res Net-18 network [He et al., 2016] to extract linear features. Since the network has learned a rich set of hierarchical features from the Image Net dataset [Deng et al., 2009], it typically extracts useful features for other image datasets.
Dataset Splits No The paper mentions '90% of the images from each class are assigned to the training set, while the remaining 10% belong to the testing set.' for meta-learning, but it does not specify a distinct validation set or its split proportion.
Hardware Specification No The paper mentions training models and evaluating computation time but does not specify any particular hardware used for running the experiments (e.g., specific GPU or CPU models, memory).
Software Dependencies No The paper mentions 'pytorch' once in the context of initialization ('torch.nn.init.uniform_ in pytorch') but does not specify its version or any other software dependencies with version numbers.
Experiment Setup Yes We fine-tune the stepsize for all approaches using the following strategy: we pick a fixed breakpoint denoted as t0 and adjust the stepsize accordingly. Specifically, for the t-th outer iteration when t t0, the stepsize is set as 1/t, while for iterations beyond t0, the stepsize is set as 1/t. ... we set the hyper-parameter λ = 2. ... we specify the hyper-parameter β = 5 to balance the trade-off between loss function approximation and smoothness. ... The hyper-parameter λ for WDRO-SI or WDRO formulation has been fine-tuned via grid search from the set {1, 10, 50, 100, 150} for optimal performance. ... Here we take the number of gradient updates at inner level m from {1, 4, 8, 12}.