reproducibilityindex.ai

When Is Generalizable Reinforcement Learning Tractable?

Authors: Dhruv Malik, Yuanzhi Li, Pradeep Ravikumar

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Our Contributions. We introduce Weak Proximity, a natural structural condition that is motivated by classical RL results, and requires the environments to have highly similar transition and reward functions and share optimal trajectories. We prove a statistical lower bound demonstrating that tractable generalization is impossible, despite this shared structure. This lower bound holds even when each individual environment can be efﬁciently solved to obtain an optimal linear policy, and when the agent possesses a generative model. Consequentially, we show that a classical metric for measuring the relative closeness of MDPs is not the right metric for modern RL generalization settings. Our lower bound implies that learning a state representation for the purpose of efﬁciently generalizing to multiple environments, is worst case sample inefﬁcient even when such a representation exists, the environments are ostensibly similar, and any single environment can be efﬁciently solved. To provide a sufﬁcient condition for efﬁcient generalization, we introduce Strong Proximity. This structural condition strengthens Weak Proximity by additionally constraining the environments to share an optimal policy. We provide an algorithm which exploits Strong Proximity to provably and efﬁciently generalize, when the environments share deterministic transitions. and from the checklist: '3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [N/A] No experiments were run.'
Researcher Affiliation	Academia	Dhruv Malik Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 Yuanzhi Li Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 Pradeep Ravikumar Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213
Pseudocode	Yes	Algorithm 1 Inputs: horizon length H, distribution D, sample size n, oracle b V as deﬁned in WIO
Open Source Code	No	The paper states under '3. If you ran experiments...' and '4. If you are using existing assets...' that 'No experiments were run' and 'No such assets were used or created', which implies no custom code for the methodology was released.
Open Datasets	No	The paper states 'No experiments were run', indicating no dataset was used for training.
Dataset Splits	No	The paper states 'No experiments were run', indicating no dataset was used for validation.
Hardware Specification	No	The paper explicitly states 'No experiments were run', meaning no hardware was used for experiments and thus no specifications are provided.
Software Dependencies	No	The paper explicitly states 'No experiments were run', meaning no software dependencies for experiments are relevant or provided.
Experiment Setup	No	The paper explicitly states 'No experiments were run', so no experimental setup details like hyperparameters are provided.