Leveraging Demonstrations to Improve Online Learning: Quality Matters

Authors: Botao Hao, Rahul Jain, Tor Lattimore, Benjamin Van Roy, Zheng Wen

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically investigate the role of offline demonstration data in terms of regret reduction. We compare the (approximate) informed TS algorithm with two baseline algorithms:
Researcher Affiliation Collaboration 1Deepmind 2University of Southern California
Pseudocode Yes Algorithm 1 Approximate i TS
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets No The paper describes generating data for its experiments ("Gaussian bandit", "linear Gaussian bandit") but does not provide any specific links, citations, or access information for a publicly available dataset.
Dataset Splits No The paper does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or specific predefined splits).
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory used for running the experiments.
Software Dependencies No The paper mentions the use of "CVXPY (Diamond & Boyd, 2016)" but does not specify its version number or any other software dependencies with their respective versions.
Experiment Setup Yes The offline demonstration datasize is fixed at N = 10... Each algorithm is run for a horizon T = 1000 and we compute the average cumulative regret over 100 independent runs for each algorithm.