DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing
Authors: Vint Lee, Pieter Abbeel, Youngwoon Lee
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS |
| Researcher Affiliation | Academia | 1University of California, Berkeley 2Yonsei University |
| Pseudocode | Yes | Algorithm 1 COLLECT ROLLOUT (π: policy, D: replay buffer) in DREAMSMOOTH |
| Open Source Code | No | The paper does not provide a direct link to a code repository or explicitly state that the source code for their methodology is available. |
| Open Datasets | Yes | We evaluate Dream Smooth on four tasks with sparse subtask completion rewards and two common RL benchmarks. Earthmoving uses two 64 × 64 images as an observation while all other tasks use a single image. See Appendix C for environment details. Robo Desk: We use a modified version of Robo Desk (Kannan et al., 2021)... Hand: The Hand task (Plappert et al., 2018)... Earthmoving: The agent controls a wheel loader... Crafter: Crafter (Hafner, 2022)... DMC: We benchmark 7 Deep Mind Control Suite continuous control tasks (Tassa et al., 2018). Atari: We benchmark 6 Atari tasks (Bellemare et al., 2013) at 100K steps. |
| Dataset Splits | No | The paper refers to 'evaluation episodes' and describes training and testing processes, but it does not explicitly specify data splits for training, validation, and testing sets, nor does it mention cross-validation techniques with concrete details. |
| Hardware Specification | Yes | Models are trained on NVIDIA A5000, V100, RTX Titan, RTX 2080, and RTX 6000 GPUs. |
| Software Dependencies | No | The paper mentions using 'scipy.ndimage' functions but does not specify version numbers for any software dependencies, such as Python, PyTorch, or Scipy itself. |
| Experiment Setup | Yes | Hyperparameters for Dreamer V3, TD-MPC, and MBPO experiments are shown in Table 1, Table 2, and Table 3, respectively. |