reproducibilityindex.ai

Leveraging Demonstrations to Improve Online Learning: Quality Matters

Authors: Botao Hao, Rahul Jain, Tor Lattimore, Benjamin Van Roy, Zheng Wen

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically investigate the role of offline demonstration data in terms of regret reduction. We compare the (approximate) informed TS algorithm with two baseline algorithms:
Researcher Affiliation	Collaboration	1Deepmind 2University of Southern California
Pseudocode	Yes	Algorithm 1 Approximate i TS
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets	No	The paper describes generating data for its experiments ("Gaussian bandit", "linear Gaussian bandit") but does not provide any specific links, citations, or access information for a publicly available dataset.
Dataset Splits	No	The paper does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or specific predefined splits).
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory used for running the experiments.
Software Dependencies	No	The paper mentions the use of "CVXPY (Diamond & Boyd, 2016)" but does not specify its version number or any other software dependencies with their respective versions.
Experiment Setup	Yes	The offline demonstration datasize is fixed at N = 10... Each algorithm is run for a horizon T = 1000 and we compute the average cumulative regret over 100 independent runs for each algorithm.