DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing

Authors: Vint Lee, Pieter Abbeel, Youngwoon Lee

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 EXPERIMENTS
Researcher Affiliation Academia 1University of California, Berkeley 2Yonsei University
Pseudocode Yes Algorithm 1 COLLECT ROLLOUT (π: policy, D: replay buffer) in DREAMSMOOTH
Open Source Code No The paper does not provide a direct link to a code repository or explicitly state that the source code for their methodology is available.
Open Datasets Yes We evaluate Dream Smooth on four tasks with sparse subtask completion rewards and two common RL benchmarks. Earthmoving uses two 64 × 64 images as an observation while all other tasks use a single image. See Appendix C for environment details. Robo Desk: We use a modified version of Robo Desk (Kannan et al., 2021)... Hand: The Hand task (Plappert et al., 2018)... Earthmoving: The agent controls a wheel loader... Crafter: Crafter (Hafner, 2022)... DMC: We benchmark 7 Deep Mind Control Suite continuous control tasks (Tassa et al., 2018). Atari: We benchmark 6 Atari tasks (Bellemare et al., 2013) at 100K steps.
Dataset Splits No The paper refers to 'evaluation episodes' and describes training and testing processes, but it does not explicitly specify data splits for training, validation, and testing sets, nor does it mention cross-validation techniques with concrete details.
Hardware Specification Yes Models are trained on NVIDIA A5000, V100, RTX Titan, RTX 2080, and RTX 6000 GPUs.
Software Dependencies No The paper mentions using 'scipy.ndimage' functions but does not specify version numbers for any software dependencies, such as Python, PyTorch, or Scipy itself.
Experiment Setup Yes Hyperparameters for Dreamer V3, TD-MPC, and MBPO experiments are shown in Table 1, Table 2, and Table 3, respectively.