Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Offline Meta Reinforcement Learning -- Identifiability Challenges and Effective Data Collection Strategies
Authors: Ron Dorfman, Idan Shenfeld, Aviv Tamar
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we evaluate our framework on a diverse set of domains, including difficult sparse reward tasks, and demonstrate learning of effective exploration behavior that is qualitatively different from the exploration used by any RL agent in the data. |
| Researcher Affiliation | Academia | Ron Dorfman Technion EMAIL Idan Shenfeld Technion EMAIL Aviv Tamar Technion EMAIL |
| Pseudocode | Yes | In Appendix B we provide pseudo-code, and detail how to apply the insights of Proposition 3 to a practical episodic RL setting. |
| Open Source Code | Yes | Our code is available online at https://github.com/Rondorf/BORe L. |
| Open Datasets | No | The paper mentions environments like Gridworld and Half-Cheetah-Vel, and states: 'For data collection, we used off-the-shelf DQN (Gridworld) and SAC (continuous domains) implementations.' and 'we diversified the offline dataset by modifying the initial state distribution Pinit'. This indicates data was generated rather than using a pre-existing publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper does not provide specific training/validation/test dataset splits (e.g., percentages or sample counts). It refers to training on a 'set of environments' and evaluating on 'unseen tasks' but lacks the explicit partitioning details of the data itself. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'off-the-shelf DQN (Gridworld) and SAC (continuous domains) implementations' but does not specify version numbers for these or any other software libraries or dependencies, which are crucial for reproducibility. |
| Experiment Setup | No | The paper states: 'Technically, network architectures and hyperparameters were chosen similarly to [36], as detailed in the supplementary.' While hyperparameters are mentioned, their specific values are deferred to supplementary material and are not detailed in the main text. |