reproducibilityindex.ai

Offline Meta Reinforcement Learning -- Identifiability Challenges and Effective Data Collection Strategies

Authors: Ron Dorfman, Idan Shenfeld, Aviv Tamar

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we evaluate our framework on a diverse set of domains, including difﬁcult sparse reward tasks, and demonstrate learning of effective exploration behavior that is qualitatively different from the exploration used by any RL agent in the data.
Researcher Affiliation	Academia	Ron Dorfman Technion rdorfman@campus.technion.ac.il Idan Shenfeld Technion idanshen@campus.technion.ac.il Aviv Tamar Technion avivt@technion.ac.il
Pseudocode	Yes	In Appendix B we provide pseudo-code, and detail how to apply the insights of Proposition 3 to a practical episodic RL setting.
Open Source Code	Yes	Our code is available online at https://github.com/Rondorf/BORe L.
Open Datasets	No	The paper mentions environments like Gridworld and Half-Cheetah-Vel, and states: 'For data collection, we used off-the-shelf DQN (Gridworld) and SAC (continuous domains) implementations.' and 'we diversiﬁed the ofﬂine dataset by modifying the initial state distribution Pinit'. This indicates data was generated rather than using a pre-existing publicly available dataset with concrete access information.
Dataset Splits	No	The paper does not provide specific training/validation/test dataset splits (e.g., percentages or sample counts). It refers to training on a 'set of environments' and evaluating on 'unseen tasks' but lacks the explicit partitioning details of the data itself.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions using 'off-the-shelf DQN (Gridworld) and SAC (continuous domains) implementations' but does not specify version numbers for these or any other software libraries or dependencies, which are crucial for reproducibility.
Experiment Setup	No	The paper states: 'Technically, network architectures and hyperparameters were chosen similarly to [36], as detailed in the supplementary.' While hyperparameters are mentioned, their specific values are deferred to supplementary material and are not detailed in the main text.