reproducibilityindex.ai

Contextual Stochastic Bilevel Optimization

Authors: Yifan Hu, Jie Wang, Yao Xie, Andreas Krause, Daniel Kuhn

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments further validate our theoretical results. sections like 4 Applications and Numerical Experiments. Figures 1, 2, and 3 show Test error and Logistic Loss Value which are empirical results.
Researcher Affiliation	Academia	Yifan Hu EPFL & ETH Zürich Switzerland Jie Wang Gatech United States Yao Xie Gatech United States Andreas Krause ETH Zürich Switzerland Daniel Kuhn EPFL Switzerland
Pseudocode	Yes	Algorithm 1 Epoch SGD, Algorithm 2 SGD Framework, Algorithm 3 RT-MLMC Gradient Estimator for Conditional Bilevel Optimization, Algorithm 4 Hessian Vector Implementation for Computing bc(x, y; ξ).
Open Source Code	No	The paper does not contain any explicit statement or link indicating that the source code for the methodology is openly available.
Open Datasets	Yes	The experiment is examined on tiny Image Net [Mnmoustafa, 2017] by pre-processing it using the pre-trained Res Net-18 network [He et al., 2016] to extract linear features. Since the network has learned a rich set of hierarchical features from the Image Net dataset [Deng et al., 2009], it typically extracts useful features for other image datasets.
Dataset Splits	No	The paper mentions '90% of the images from each class are assigned to the training set, while the remaining 10% belong to the testing set.' for meta-learning, but it does not specify a distinct validation set or its split proportion.
Hardware Specification	No	The paper mentions training models and evaluating computation time but does not specify any particular hardware used for running the experiments (e.g., specific GPU or CPU models, memory).
Software Dependencies	No	The paper mentions 'pytorch' once in the context of initialization ('torch.nn.init.uniform_ in pytorch') but does not specify its version or any other software dependencies with version numbers.
Experiment Setup	Yes	We fine-tune the stepsize for all approaches using the following strategy: we pick a fixed breakpoint denoted as t0 and adjust the stepsize accordingly. Specifically, for the t-th outer iteration when t t0, the stepsize is set as 1/t, while for iterations beyond t0, the stepsize is set as 1/t. ... we set the hyper-parameter λ = 2. ... we specify the hyper-parameter β = 5 to balance the trade-off between loss function approximation and smoothness. ... The hyper-parameter λ for WDRO-SI or WDRO formulation has been fine-tuned via grid search from the set {1, 10, 50, 100, 150} for optimal performance. ... Here we take the number of gradient updates at inner level m from {1, 4, 8, 12}.