Learning Robust State Abstractions for Hidden-Parameter Block MDPs

Authors: Amy Zhang, Shagun Sodhani, Khimya Khetarpal, Joelle Pineau

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To further demonstrate the efficacy of the proposed method, we empirically compare and show improvement over multi-task and meta-reinforcement learning baselines. (...) We use environments from Deepmind Control Suite (DMC) (Tassa et al., 2018) to evaluate our method for learning Hi P-BMDPs for both multi-task RL and meta-reinforcement learning settings.
Researcher Affiliation Collaboration Amy Zhang 123 Shagun Sodhani2 Khimya Khetarpal13 Joelle Pineau123 1Mc Gill University 2Facebook AI Research 3Mila
Pseudocode Yes Algorithm 1 Hi P-BMDP training for the Multi-task RL setting. (...) Algorithm 2 Update Model Using Hip-BMDPLoss (...) Algorithm 3 Hi P-MDP training for the meta-RL setting.
Open Source Code No The paper provides a link to sample videos of policies (https://sites.google.com/view/hip-bmdp), but not to the source code for the described methodology.
Open Datasets Yes We use environments from Deepmind Control Suite (DMC) (Tassa et al., 2018) to evaluate our method for learning Hi P-BMDPs for both multi-task RL and meta-reinforcement learning settings.
Dataset Splits Yes We denote the ordered MDPs as A H. MDPs {B, C, F, G} are training environments and {D, E} are used for evaluating the model in the interpolation setup (i.e. the value of the perturbation-parameter can be obtained by interpolation). MDPs {A, H} are for evaluating the model in the extrapolation setup (i.e. the value of the perturbation-parameter can be obtained by extrapolation).
Hardware Specification No The paper states that experiments are time-intensive or discusses aspects like 'using a GPU' but does not provide specific hardware details such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions software like Soft Actor Critic (SAC), PEARL, Deep MDP, and SAC-AE, but does not provide specific version numbers for any of these or for programming languages or libraries used.
Experiment Setup Yes Implementation details can be found in Appendix D. (...) All the hyper parameters (for MTRL algorithm) are listed in Table 1.