Dynamic Balancing for Model Selection in Bandits and RL

Authors: Ashok Cutkosky, Christoph Dann, Abhimanyu Das, Claudio Gentile, Aldo Pacchiano, Manish Purohit

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 7. Experiments To investigate the practical usefulness of our Dynamic Balancing approach and compare it against existing methods, we conducted experiments on synthetic linear bandit instances with 100 actions of dimension 10 each. [...] Figure 1 shows our experimental results for the three bandit instances.
Researcher Affiliation Collaboration 1Boston University, Boston, Massachussetts, USA 2Google Research, New York, NY, USA 3Google Research, Mountain View, California, USA 4University of California, Berkeley, California, USA.
Pseudocode Yes Algorithm 1: The Dynamic Balancing Algorithm
Open Source Code No The paper does not contain an explicit statement or link indicating the availability of open-source code for the described methodology.
Open Datasets No We conducted experiments on synthetic linear bandit instances with 100 actions of dimension 10 each.
Dataset Splits No The paper does not provide specific dataset split information (e.g., percentages, sample counts, or methodology for splitting).
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes Specifically, we use 10 instances of OFUL as base learners with confidence scaling parameters on a geometric grid in [ 1/100, 1]...We evaluate it on three bandit instances with reward noise of standard deviation σ = 1, σ = 0.3 and σ = 0.05 each. [...] Both Corral and Stochastic Corral require a learning rate which we set to ξ/T where ξ was picked as the value from {1, 10, 100, 1000} that was most competitive for each algorithm across the 3 problem instances.