Experiment Planning with Function Approximation
Authors: Aldo Pacchiano, Jonathan Lee, Emma Brunskill
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work we propose two experiment planning strategies compatible with function approximation. The first is an eluder planning and sampling procedure that can recover optimality guarantees depending on the eluder dimension [42] of the reward function class. For the second, we show that a uniform sampler achieves competitive optimality rates in the setting where the number of actions is small. We finalize our results introducing a statistical gap fleshing out the fundamental differences between planning and adaptive learning and provide results for planning with model selection. |
| Researcher Affiliation | Academia | Aldo Pacchiano Broad Institute & Boston University apacchia@broadinstitute.org Jonathan N. Lee Stanford University jnl@stanford.edu Emma Brunskill Stanford University ebrun@cs.stanford.edu |
| Pseudocode | Yes | Algorithm 1 Eluder Planner and Algorithm 2 Sampler |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available. |
| Open Datasets | No | The paper is theoretical and does not conduct empirical experiments using specific datasets. While it refers to "m T i.i.d. offline context samples" as part of its theoretical problem definition, it does not provide concrete access information (link, DOI, citation) for any publicly available or open dataset used for empirical training. |
| Dataset Splits | No | The paper is theoretical and does not describe any empirical experiments. Therefore, it does not provide specific dataset split information (e.g., percentages or sample counts for training, validation, and testing) needed to reproduce data partitioning for empirical evaluation. |
| Hardware Specification | No | The paper is theoretical and does not conduct empirical experiments. Therefore, it does not provide specific hardware details (like GPU/CPU models or memory amounts) used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not conduct empirical experiments. Therefore, it does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate experiments. |
| Experiment Setup | No | The paper is theoretical and does not conduct empirical experiments. Therefore, it does not contain specific experimental setup details such as hyperparameter values, training configurations, or system-level settings. |