Predictable MDP Abstraction for Unsupervised Model-Based RL
Authors: Seohong Park, Sergey Levine
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We theoretically analyze PMA and empirically demonstrate that PMA leads to significant improvements over prior unsupervised modelbased RL approaches in a range of benchmark environments. |
| Researcher Affiliation | Academia | 1University of California, Berkeley. Correspondence to: Seohong Park <seohong@berkeley.edu>. |
| Pseudocode | Yes | We describe the full training procedure of PMA in Appendix F and Algorithm 1. |
| Open Source Code | Yes | Our code and videos are available at https://seohong.me/projects/pma/ |
| Open Datasets | Yes | We test PMA and the four previous methods on seven Mu Jo Co robotics environments (Todorov et al., 2012; Brockman et al., 2016) with 13 diverse tasks. |
| Dataset Splits | No | The paper specifies environment configurations and episode lengths but does not provide explicit dataset split percentages, sample counts, or methods for splitting data into training, validation, and test sets. |
| Hardware Specification | Yes | We run our experiments on an internal cluster consisting of A5000 or similar GPUs. |
| Software Dependencies | No | The paper mentions implementation on top of the 'Li SP (Lu et al., 2021) codebase' and uses 'Adam (Kingma & Ba, 2015)' and 'SAC (Haarnoja et al., 2018b)', but it does not provide specific version numbers for general software dependencies like Python, PyTorch, or other libraries. |
| Experiment Setup | Yes | We present the hyperparameters used in our experiments in Tables 1 to 3. For example, Table 1 lists: # epochs 10000, # environment steps per epoch 4000, Minibatch size 256, Discount factor γ 0.995, Learning rate 3e-4, etc. |