reproducibilityindex.ai

DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing

Authors: Vint Lee, Pieter Abbeel, Youngwoon Lee

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 EXPERIMENTS
Researcher Affiliation	Academia	1University of California, Berkeley 2Yonsei University
Pseudocode	Yes	Algorithm 1 COLLECT ROLLOUT (π: policy, D: replay buffer) in DREAMSMOOTH
Open Source Code	No	The paper does not provide a direct link to a code repository or explicitly state that the source code for their methodology is available.
Open Datasets	Yes	We evaluate Dream Smooth on four tasks with sparse subtask completion rewards and two common RL benchmarks. Earthmoving uses two 64 × 64 images as an observation while all other tasks use a single image. See Appendix C for environment details. Robo Desk: We use a modified version of Robo Desk (Kannan et al., 2021)... Hand: The Hand task (Plappert et al., 2018)... Earthmoving: The agent controls a wheel loader... Crafter: Crafter (Hafner, 2022)... DMC: We benchmark 7 Deep Mind Control Suite continuous control tasks (Tassa et al., 2018). Atari: We benchmark 6 Atari tasks (Bellemare et al., 2013) at 100K steps.
Dataset Splits	No	The paper refers to 'evaluation episodes' and describes training and testing processes, but it does not explicitly specify data splits for training, validation, and testing sets, nor does it mention cross-validation techniques with concrete details.
Hardware Specification	Yes	Models are trained on NVIDIA A5000, V100, RTX Titan, RTX 2080, and RTX 6000 GPUs.
Software Dependencies	No	The paper mentions using 'scipy.ndimage' functions but does not specify version numbers for any software dependencies, such as Python, PyTorch, or Scipy itself.
Experiment Setup	Yes	Hyperparameters for Dreamer V3, TD-MPC, and MBPO experiments are shown in Table 1, Table 2, and Table 3, respectively.