Offline Meta Reinforcement Learning -- Identifiability Challenges and Effective Data Collection Strategies
Authors: Ron Dorfman, Idan Shenfeld, Aviv Tamar
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we evaluate our framework on a diverse set of domains, including difficult sparse reward tasks, and demonstrate learning of effective exploration behavior that is qualitatively different from the exploration used by any RL agent in the data. |
| Researcher Affiliation | Academia | Ron Dorfman Technion rdorfman@campus.technion.ac.il Idan Shenfeld Technion idanshen@campus.technion.ac.il Aviv Tamar Technion avivt@technion.ac.il |
| Pseudocode | Yes | In Appendix B we provide pseudo-code, and detail how to apply the insights of Proposition 3 to a practical episodic RL setting. |
| Open Source Code | Yes | Our code is available online at https://github.com/Rondorf/BORe L. |
| Open Datasets | No | The paper mentions environments like Gridworld and Half-Cheetah-Vel, and states: 'For data collection, we used off-the-shelf DQN (Gridworld) and SAC (continuous domains) implementations.' and 'we diversified the offline dataset by modifying the initial state distribution Pinit'. This indicates data was generated rather than using a pre-existing publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper does not provide specific training/validation/test dataset splits (e.g., percentages or sample counts). It refers to training on a 'set of environments' and evaluating on 'unseen tasks' but lacks the explicit partitioning details of the data itself. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'off-the-shelf DQN (Gridworld) and SAC (continuous domains) implementations' but does not specify version numbers for these or any other software libraries or dependencies, which are crucial for reproducibility. |
| Experiment Setup | No | The paper states: 'Technically, network architectures and hyperparameters were chosen similarly to [36], as detailed in the supplementary.' While hyperparameters are mentioned, their specific values are deferred to supplementary material and are not detailed in the main text. |