Denoised MDPs: Learning World Models Better Than the World Itself
Authors: Tongzhou Wang, Simon Du, Antonio Torralba, Phillip Isola, Amy Zhang, Yuandong Tian
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on variants of Deep Mind Control Suite and Robo Desk demonstrate superior performance of our denoised world model over using raw observations alone, and over prior works, across policy optimization control tasks as well as the non-control task of joint position regression. |
| Researcher Affiliation | Collaboration | 1MIT CSAIL 2University of Washington 3UC Berkeley 4Meta AI. Correspondence to: Tongzhou Wang <tongzhou@mit.edu>. Work done while Tongzhou Wang was an intern at Meta AI. |
| Pseudocode | Yes | Algorithm 1 Denoised MDP |
| Open Source Code | Yes | Code: github.com/facebookresearch/denoised_mdp |
| Open Datasets | Yes | We choose Deep Mind Control (DMC) Suite (Tunyasuvunakool et al., 2020) (Section 5.2) and Robo Desk (Kannan et al., 2021) (Section 5.1) with image observations |
| Dataset Splits | Yes | To generate training and test set, we use four policies trained by state-space SAC at different stages of training... From these, 100 × 103 samples are randomly selected as test set. Training sets of sizes 5 × 103, 10 × 103, 25 × 103, 50 × 103, 100 × 103, 150 × 103 are sampled from the rest. |
| Hardware Specification | Yes | All our experiments are run on a single GPU, requiring 8GB memory for DMC tasks, and 16GB memory for Robo Desk tasks. We use NVIDIA GPUs of the following types: 1080 Ti, 2080 Ti, 3080 Ti, P100, V100, Titan XP, Titan RTX. |
| Software Dependencies | No | The paper mentions software like 'Mu Jo Co' and 'EGL rendering engine' but does not specify their version numbers or the versions of any programming languages or libraries used. |
| Experiment Setup | Yes | For all experiments, we let the algorithms use 106 environment steps. For PI-SAC and CURL, we follow the original implementations (Laskin et al., 2020b; Lee et al., 2020) and use an action repeat of 4 for Cheetah Run and Reacher Easy, and an action repeat of 2 for Walker Walk. For Denoised MDP, Dreamer, TIA and DBC, we always use an action repeat of 2, following prior works (Hafner et al., 2019a; Fu et al., 2021; Zhang et al., 2020). |