reproducibilityindex.ai

Learning Robust State Abstractions for Hidden-Parameter Block MDPs

Authors: Amy Zhang, Shagun Sodhani, Khimya Khetarpal, Joelle Pineau

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To further demonstrate the efﬁcacy of the proposed method, we empirically compare and show improvement over multi-task and meta-reinforcement learning baselines. (...) We use environments from Deepmind Control Suite (DMC) (Tassa et al., 2018) to evaluate our method for learning Hi P-BMDPs for both multi-task RL and meta-reinforcement learning settings.
Researcher Affiliation	Collaboration	Amy Zhang 123 Shagun Sodhani2 Khimya Khetarpal13 Joelle Pineau123 1Mc Gill University 2Facebook AI Research 3Mila
Pseudocode	Yes	Algorithm 1 Hi P-BMDP training for the Multi-task RL setting. (...) Algorithm 2 Update Model Using Hip-BMDPLoss (...) Algorithm 3 Hi P-MDP training for the meta-RL setting.
Open Source Code	No	The paper provides a link to sample videos of policies (https://sites.google.com/view/hip-bmdp), but not to the source code for the described methodology.
Open Datasets	Yes	We use environments from Deepmind Control Suite (DMC) (Tassa et al., 2018) to evaluate our method for learning Hi P-BMDPs for both multi-task RL and meta-reinforcement learning settings.
Dataset Splits	Yes	We denote the ordered MDPs as A H. MDPs {B, C, F, G} are training environments and {D, E} are used for evaluating the model in the interpolation setup (i.e. the value of the perturbation-parameter can be obtained by interpolation). MDPs {A, H} are for evaluating the model in the extrapolation setup (i.e. the value of the perturbation-parameter can be obtained by extrapolation).
Hardware Specification	No	The paper states that experiments are time-intensive or discusses aspects like 'using a GPU' but does not provide specific hardware details such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions software like Soft Actor Critic (SAC), PEARL, Deep MDP, and SAC-AE, but does not provide specific version numbers for any of these or for programming languages or libraries used.
Experiment Setup	Yes	Implementation details can be found in Appendix D. (...) All the hyper parameters (for MTRL algorithm) are listed in Table 1.