Adaptive Regret for Bandits Made Possible: Two Queries Suffice

Authors: Zhou Lu, Qiuyi Zhang, Xinyi Chen, Fred Zhang, David Woodruff, Elad Hazan

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we empirically demonstrate the superior performance of our algorithms under volatile environments and for downstream tasks, such as algorithm selection for hyperparameter optimization. 5 EXPERIMENTS In this section, we evaluate the proposed algorithms on synthetic data and the downstream task of hyperparameter optimization.
Researcher Affiliation Collaboration Zhou Lu Princeton University Qiuyi Zhang Google Deepmind Xinyi Chen Google Deepmind Princeton University Fred Zhang UC Berkeley David P. Woodruff Google Research Carnegie Mellon University Elad Hazan Google Deepmind Princeton University
Pseudocode Yes Algorithm 1 Strongly Adaptive Bandit Learner (St ABL) Algorithm 2 Sub-routine: EXP3 with a General Loss Estimator
Open Source Code No The paper does not provide any specific links or statements about releasing the source code for the described methodology.
Open Datasets Yes The reward is determined by the underlying task of blackbox optimization, which is done on Black-Box Optimization Benchmark (BBOB) functions (Tušar et al., 2016).
Dataset Splits No The paper describes generating synthetic data and using BBOB functions, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits).
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, frameworks, or solvers).
Experiment Setup Yes We take N = 30, time horizon T = 4096... we divide the time horizon into 4 intervals... we set the reward rt,0 = ert,0 + 0.5... we randomly generated time steps at which the best arm changes, and they are 1355, 1437, 1798, 3249, for a time horizon of 4096. ... For metrics, we use the log objective curve, as well as the performance profile score... We run each of our algorithms in dimensions d = 32, 64 and optimize for 1000, 2000 iterations with 5 repeats. ... we also add a regularization term of λf(xj) with λ = 0.01... St ABL with history lengths [20, 40, 80, 160, 320, 640, 1280, 2560].