Leveraging Demonstrations to Improve Online Learning: Quality Matters
Authors: Botao Hao, Rahul Jain, Tor Lattimore, Benjamin Van Roy, Zheng Wen
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically investigate the role of offline demonstration data in terms of regret reduction. We compare the (approximate) informed TS algorithm with two baseline algorithms: |
| Researcher Affiliation | Collaboration | 1Deepmind 2University of Southern California |
| Pseudocode | Yes | Algorithm 1 Approximate i TS |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | No | The paper describes generating data for its experiments ("Gaussian bandit", "linear Gaussian bandit") but does not provide any specific links, citations, or access information for a publicly available dataset. |
| Dataset Splits | No | The paper does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or specific predefined splits). |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions the use of "CVXPY (Diamond & Boyd, 2016)" but does not specify its version number or any other software dependencies with their respective versions. |
| Experiment Setup | Yes | The offline demonstration datasize is fixed at N = 10... Each algorithm is run for a horizon T = 1000 and we compute the average cumulative regret over 100 independent runs for each algorithm. |