reproducibilityindex.ai

Denoised MDPs: Learning World Models Better Than the World Itself

Authors: Tongzhou Wang, Simon Du, Antonio Torralba, Phillip Isola, Amy Zhang, Yuandong Tian

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on variants of Deep Mind Control Suite and Robo Desk demonstrate superior performance of our denoised world model over using raw observations alone, and over prior works, across policy optimization control tasks as well as the non-control task of joint position regression.
Researcher Affiliation	Collaboration	1MIT CSAIL 2University of Washington 3UC Berkeley 4Meta AI. Correspondence to: Tongzhou Wang <tongzhou@mit.edu>. Work done while Tongzhou Wang was an intern at Meta AI.
Pseudocode	Yes	Algorithm 1 Denoised MDP
Open Source Code	Yes	Code: github.com/facebookresearch/denoised_mdp
Open Datasets	Yes	We choose Deep Mind Control (DMC) Suite (Tunyasuvunakool et al., 2020) (Section 5.2) and Robo Desk (Kannan et al., 2021) (Section 5.1) with image observations
Dataset Splits	Yes	To generate training and test set, we use four policies trained by state-space SAC at different stages of training... From these, 100 × 103 samples are randomly selected as test set. Training sets of sizes 5 × 103, 10 × 103, 25 × 103, 50 × 103, 100 × 103, 150 × 103 are sampled from the rest.
Hardware Specification	Yes	All our experiments are run on a single GPU, requiring 8GB memory for DMC tasks, and 16GB memory for Robo Desk tasks. We use NVIDIA GPUs of the following types: 1080 Ti, 2080 Ti, 3080 Ti, P100, V100, Titan XP, Titan RTX.
Software Dependencies	No	The paper mentions software like 'Mu Jo Co' and 'EGL rendering engine' but does not specify their version numbers or the versions of any programming languages or libraries used.
Experiment Setup	Yes	For all experiments, we let the algorithms use 106 environment steps. For PI-SAC and CURL, we follow the original implementations (Laskin et al., 2020b; Lee et al., 2020) and use an action repeat of 4 for Cheetah Run and Reacher Easy, and an action repeat of 2 for Walker Walk. For Denoised MDP, Dreamer, TIA and DBC, we always use an action repeat of 2, following prior works (Hafner et al., 2019a; Fu et al., 2021; Zhang et al., 2020).