Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Predictable MDP Abstraction for Unsupervised Model-Based RL
Authors: Seohong Park, Sergey Levine
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We theoretically analyze PMA and empirically demonstrate that PMA leads to significant improvements over prior unsupervised modelbased RL approaches in a range of benchmark environments. |
| Researcher Affiliation | Academia | 1University of California, Berkeley. Correspondence to: Seohong Park <EMAIL>. |
| Pseudocode | Yes | We describe the full training procedure of PMA in Appendix F and Algorithm 1. |
| Open Source Code | Yes | Our code and videos are available at https://seohong.me/projects/pma/ |
| Open Datasets | Yes | We test PMA and the four previous methods on seven Mu Jo Co robotics environments (Todorov et al., 2012; Brockman et al., 2016) with 13 diverse tasks. |
| Dataset Splits | No | The paper specifies environment configurations and episode lengths but does not provide explicit dataset split percentages, sample counts, or methods for splitting data into training, validation, and test sets. |
| Hardware Specification | Yes | We run our experiments on an internal cluster consisting of A5000 or similar GPUs. |
| Software Dependencies | No | The paper mentions implementation on top of the 'Li SP (Lu et al., 2021) codebase' and uses 'Adam (Kingma & Ba, 2015)' and 'SAC (Haarnoja et al., 2018b)', but it does not provide specific version numbers for general software dependencies like Python, PyTorch, or other libraries. |
| Experiment Setup | Yes | We present the hyperparameters used in our experiments in Tables 1 to 3. For example, Table 1 lists: # epochs 10000, # environment steps per epoch 4000, Minibatch size 256, Discount factor γ 0.995, Learning rate 3e-4, etc. |