reproducibilityindex.ai

Dynamic Balancing for Model Selection in Bandits and RL

Authors: Ashok Cutkosky, Christoph Dann, Abhimanyu Das, Claudio Gentile, Aldo Pacchiano, Manish Purohit

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	7. Experiments To investigate the practical usefulness of our Dynamic Balancing approach and compare it against existing methods, we conducted experiments on synthetic linear bandit instances with 100 actions of dimension 10 each. [...] Figure 1 shows our experimental results for the three bandit instances.
Researcher Affiliation	Collaboration	1Boston University, Boston, Massachussetts, USA 2Google Research, New York, NY, USA 3Google Research, Mountain View, California, USA 4University of California, Berkeley, California, USA.
Pseudocode	Yes	Algorithm 1: The Dynamic Balancing Algorithm
Open Source Code	No	The paper does not contain an explicit statement or link indicating the availability of open-source code for the described methodology.
Open Datasets	No	We conducted experiments on synthetic linear bandit instances with 100 actions of dimension 10 each.
Dataset Splits	No	The paper does not provide specific dataset split information (e.g., percentages, sample counts, or methodology for splitting).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	Speciﬁcally, we use 10 instances of OFUL as base learners with conﬁdence scaling parameters on a geometric grid in [ 1/100, 1]...We evaluate it on three bandit instances with reward noise of standard deviation σ = 1, σ = 0.3 and σ = 0.05 each. [...] Both Corral and Stochastic Corral require a learning rate which we set to ξ/T where ξ was picked as the value from {1, 10, 100, 1000} that was most competitive for each algorithm across the 3 problem instances.