Offline Meta Reinforcement Learning -- Identifiability Challenges and Effective Data Collection Strategies

Authors: Ron Dorfman, Idan Shenfeld, Aviv Tamar

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we evaluate our framework on a diverse set of domains, including difficult sparse reward tasks, and demonstrate learning of effective exploration behavior that is qualitatively different from the exploration used by any RL agent in the data.
Researcher Affiliation Academia Ron Dorfman Technion rdorfman@campus.technion.ac.il Idan Shenfeld Technion idanshen@campus.technion.ac.il Aviv Tamar Technion avivt@technion.ac.il
Pseudocode Yes In Appendix B we provide pseudo-code, and detail how to apply the insights of Proposition 3 to a practical episodic RL setting.
Open Source Code Yes Our code is available online at https://github.com/Rondorf/BORe L.
Open Datasets No The paper mentions environments like Gridworld and Half-Cheetah-Vel, and states: 'For data collection, we used off-the-shelf DQN (Gridworld) and SAC (continuous domains) implementations.' and 'we diversified the offline dataset by modifying the initial state distribution Pinit'. This indicates data was generated rather than using a pre-existing publicly available dataset with concrete access information.
Dataset Splits No The paper does not provide specific training/validation/test dataset splits (e.g., percentages or sample counts). It refers to training on a 'set of environments' and evaluating on 'unseen tasks' but lacks the explicit partitioning details of the data itself.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions using 'off-the-shelf DQN (Gridworld) and SAC (continuous domains) implementations' but does not specify version numbers for these or any other software libraries or dependencies, which are crucial for reproducibility.
Experiment Setup No The paper states: 'Technically, network architectures and hyperparameters were chosen similarly to [36], as detailed in the supplementary.' While hyperparameters are mentioned, their specific values are deferred to supplementary material and are not detailed in the main text.