ReDiffuser: Reliable Decision-Making Using a Diffuser with Confidence Estimation

Authors: Nantian He, Shaohui Li, Zhi Li, Yu Liu, You He

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4. Experiments We construct our experiments to evaluate the proposed confidence estimation, adaptive-horizon planning, and value-embedded planning described in Section 3.
Researcher Affiliation Academia 1Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China 2Department of Electronics, Tsinghua University, Beijing, China.
Pseudocode No The paper does not contain any explicitly labeled pseudocode or algorithm blocks. The methods are described in narrative text and mathematical formulations.
Open Source Code Yes Source code is available at https://github.com/ he-nantian/Re Diffuser
Open Datasets Yes Evaluation of the standard benchmark D4RL (Fu et al., 2020) shows that the proposed Re Diffuser outperforms Diffuser... We conduct experiments using Diffuser with and without confidence estimation on KUKA block stacking task (Schreiber et al., 2010)... We evaluate adaptive-horizon planning on Maze2D tasks (Fu et al., 2020)... We evaluate value-embedded planning in locomotion tasks based on the Mu Jo Co engine (Todorov et al., 2012), which is commonly adopted in evaluating other offline RL algorithms. The D4RL locomotion tasks.
Dataset Splits No The paper references well-known datasets but does not explicitly state the training, validation, and test splits (e.g., percentages or sample counts) used for reproducibility. It does not mention a 'validation' split or how it was used beyond general training settings.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper describes the model architecture and training process but does not list specific software dependencies with their version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes At the training stage, the model is trained with a learning rate of 1e-04 and batch size of 256. In adaptive-horizon planning, we set the gap between adjacent horizons as 32; In value-embedded planning, we preserve the decision with the maximum value among 64 candidate sampled decisions.