Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Robust State Abstractions for Hidden-Parameter Block MDPs
Authors: Amy Zhang, Shagun Sodhani, Khimya Khetarpal, Joelle Pineau
ICLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To further demonstrate the efficacy of the proposed method, we empirically compare and show improvement over multi-task and meta-reinforcement learning baselines. (...) We use environments from Deepmind Control Suite (DMC) (Tassa et al., 2018) to evaluate our method for learning Hi P-BMDPs for both multi-task RL and meta-reinforcement learning settings. |
| Researcher Affiliation | Collaboration | Amy Zhang 123 Shagun Sodhani2 Khimya Khetarpal13 Joelle Pineau123 1Mc Gill University 2Facebook AI Research 3Mila |
| Pseudocode | Yes | Algorithm 1 Hi P-BMDP training for the Multi-task RL setting. (...) Algorithm 2 Update Model Using Hip-BMDPLoss (...) Algorithm 3 Hi P-MDP training for the meta-RL setting. |
| Open Source Code | No | The paper provides a link to sample videos of policies (https://sites.google.com/view/hip-bmdp), but not to the source code for the described methodology. |
| Open Datasets | Yes | We use environments from Deepmind Control Suite (DMC) (Tassa et al., 2018) to evaluate our method for learning Hi P-BMDPs for both multi-task RL and meta-reinforcement learning settings. |
| Dataset Splits | Yes | We denote the ordered MDPs as A H. MDPs {B, C, F, G} are training environments and {D, E} are used for evaluating the model in the interpolation setup (i.e. the value of the perturbation-parameter can be obtained by interpolation). MDPs {A, H} are for evaluating the model in the extrapolation setup (i.e. the value of the perturbation-parameter can be obtained by extrapolation). |
| Hardware Specification | No | The paper states that experiments are time-intensive or discusses aspects like 'using a GPU' but does not provide specific hardware details such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions software like Soft Actor Critic (SAC), PEARL, Deep MDP, and SAC-AE, but does not provide specific version numbers for any of these or for programming languages or libraries used. |
| Experiment Setup | Yes | Implementation details can be found in Appendix D. (...) All the hyper parameters (for MTRL algorithm) are listed in Table 1. |