Predictable MDP Abstraction for Unsupervised Model-Based RL

Authors: Seohong Park, Sergey Levine

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We theoretically analyze PMA and empirically demonstrate that PMA leads to significant improvements over prior unsupervised modelbased RL approaches in a range of benchmark environments.
Researcher Affiliation Academia 1University of California, Berkeley. Correspondence to: Seohong Park <seohong@berkeley.edu>.
Pseudocode Yes We describe the full training procedure of PMA in Appendix F and Algorithm 1.
Open Source Code Yes Our code and videos are available at https://seohong.me/projects/pma/
Open Datasets Yes We test PMA and the four previous methods on seven Mu Jo Co robotics environments (Todorov et al., 2012; Brockman et al., 2016) with 13 diverse tasks.
Dataset Splits No The paper specifies environment configurations and episode lengths but does not provide explicit dataset split percentages, sample counts, or methods for splitting data into training, validation, and test sets.
Hardware Specification Yes We run our experiments on an internal cluster consisting of A5000 or similar GPUs.
Software Dependencies No The paper mentions implementation on top of the 'Li SP (Lu et al., 2021) codebase' and uses 'Adam (Kingma & Ba, 2015)' and 'SAC (Haarnoja et al., 2018b)', but it does not provide specific version numbers for general software dependencies like Python, PyTorch, or other libraries.
Experiment Setup Yes We present the hyperparameters used in our experiments in Tables 1 to 3. For example, Table 1 lists: # epochs 10000, # environment steps per epoch 4000, Minibatch size 256, Discount factor γ 0.995, Learning rate 3e-4, etc.