Optimal regret algorithm for Pseudo-1d Bandit Convex Optimization

Authors: Aadirupa Saha, Nagarajan Natarajan, Praneeth Netrapalli, Prateek Jain

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present simulations in Section 5 that demonstrate the regret bounds on simple synthetic problems.
Researcher Affiliation Industry 1Microsoft Research, New York City 2Microsoft Research, India 3The authors are currently at Google Research, India.
Pseudocode Yes Algorithm 1 OPTPBCO
Open Source Code No The paper does not provide any concrete access to source code for the methodology described.
Open Datasets No We present synthetic experiments that showcase the regret bounds established in Section 4. We work with a linear gt for all the experiments. We fix W = Bd(1), context vectors from { xt 2 1}, and the two loss functions (a) ft(w) = ( w, xt y t )2 where y t = w , xt , for a fixed w Bd(1), and (b) ft(w) = | w, xt y t |.
Dataset Splits No The paper describes generating synthetic data and averaging over '50 problem instances' but does not specify training, validation, or test dataset splits.
Hardware Specification No The paper mentions running simulations but does not provide specific hardware details such as GPU/CPU models or memory.
Software Dependencies No The paper does not list specific software dependencies with version numbers used for the experiments.
Experiment Setup Yes We work with a linear gt for all the experiments. We fix W = Bd(1), context vectors from { xt 2 1}, and the two loss functions (a) ft(w) = ( w, xt y t )2 where y t = w , xt , for a fixed w Bd(1), and (b) ft(w) = | w, xt y t |. Additionally, Algorithm 1 specifies parameters: 'Run Kernelized Exponential Weights for PBCO (Algorithm 2) with η = d log(L T) / BT' and 'Run Online Gradient Descent for PBCO (Algorithm 3) with η = W δ / DC T , δ = W DC / 3L 1/2 , α = δ, and T'.