Learning Robust State Abstractions for Hidden-Parameter Block MDPs
Authors: Amy Zhang, Shagun Sodhani, Khimya Khetarpal, Joelle Pineau
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To further demonstrate the efficacy of the proposed method, we empirically compare and show improvement over multi-task and meta-reinforcement learning baselines. (...) We use environments from Deepmind Control Suite (DMC) (Tassa et al., 2018) to evaluate our method for learning Hi P-BMDPs for both multi-task RL and meta-reinforcement learning settings. |
| Researcher Affiliation | Collaboration | Amy Zhang 123 Shagun Sodhani2 Khimya Khetarpal13 Joelle Pineau123 1Mc Gill University 2Facebook AI Research 3Mila |
| Pseudocode | Yes | Algorithm 1 Hi P-BMDP training for the Multi-task RL setting. (...) Algorithm 2 Update Model Using Hip-BMDPLoss (...) Algorithm 3 Hi P-MDP training for the meta-RL setting. |
| Open Source Code | No | The paper provides a link to sample videos of policies (https://sites.google.com/view/hip-bmdp), but not to the source code for the described methodology. |
| Open Datasets | Yes | We use environments from Deepmind Control Suite (DMC) (Tassa et al., 2018) to evaluate our method for learning Hi P-BMDPs for both multi-task RL and meta-reinforcement learning settings. |
| Dataset Splits | Yes | We denote the ordered MDPs as A H. MDPs {B, C, F, G} are training environments and {D, E} are used for evaluating the model in the interpolation setup (i.e. the value of the perturbation-parameter can be obtained by interpolation). MDPs {A, H} are for evaluating the model in the extrapolation setup (i.e. the value of the perturbation-parameter can be obtained by extrapolation). |
| Hardware Specification | No | The paper states that experiments are time-intensive or discusses aspects like 'using a GPU' but does not provide specific hardware details such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions software like Soft Actor Critic (SAC), PEARL, Deep MDP, and SAC-AE, but does not provide specific version numbers for any of these or for programming languages or libraries used. |
| Experiment Setup | Yes | Implementation details can be found in Appendix D. (...) All the hyper parameters (for MTRL algorithm) are listed in Table 1. |