Experiment Planning with Function Approximation

Authors: Aldo Pacchiano, Jonathan Lee, Emma Brunskill

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this work we propose two experiment planning strategies compatible with function approximation. The first is an eluder planning and sampling procedure that can recover optimality guarantees depending on the eluder dimension [42] of the reward function class. For the second, we show that a uniform sampler achieves competitive optimality rates in the setting where the number of actions is small. We finalize our results introducing a statistical gap fleshing out the fundamental differences between planning and adaptive learning and provide results for planning with model selection.
Researcher Affiliation Academia Aldo Pacchiano Broad Institute & Boston University apacchia@broadinstitute.org Jonathan N. Lee Stanford University jnl@stanford.edu Emma Brunskill Stanford University ebrun@cs.stanford.edu
Pseudocode Yes Algorithm 1 Eluder Planner and Algorithm 2 Sampler
Open Source Code No The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available.
Open Datasets No The paper is theoretical and does not conduct empirical experiments using specific datasets. While it refers to "m T i.i.d. offline context samples" as part of its theoretical problem definition, it does not provide concrete access information (link, DOI, citation) for any publicly available or open dataset used for empirical training.
Dataset Splits No The paper is theoretical and does not describe any empirical experiments. Therefore, it does not provide specific dataset split information (e.g., percentages or sample counts for training, validation, and testing) needed to reproduce data partitioning for empirical evaluation.
Hardware Specification No The paper is theoretical and does not conduct empirical experiments. Therefore, it does not provide specific hardware details (like GPU/CPU models or memory amounts) used for running experiments.
Software Dependencies No The paper is theoretical and does not conduct empirical experiments. Therefore, it does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate experiments.
Experiment Setup No The paper is theoretical and does not conduct empirical experiments. Therefore, it does not contain specific experimental setup details such as hyperparameter values, training configurations, or system-level settings.