Diffused Task-Agnostic Milestone Planner

Authors: Mineui Hong, Minjae Kang, Songhwai Oh

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the proposed method across offline reinforcement learning (RL) benchmarks and an visual manipulation environment. The results show that our approach outperforms offline RL methods in solving long-horizon, sparse-reward tasks and multi-task problems, while also achieving the state-of-the-art performance on the most challenging vision-based manipulation benchmark. ... 4 Experiments In this section we empirically verify that DTAMP is able to handle long-horizon, multi-task decision-making problems.
Researcher Affiliation Academia Mineui Hong, Minjae Kang, and Songhwai Oh Department of Electrical and Computer Engineering and ASRI Seoul National University mineui.hong@rllab.snu.ac.kr, minjae.kang@rllab.snu.ac.kr, songhwai@snu.ac.kr
Pseudocode Yes Algorithm 1: Sequential decision-making using DTAMP
Open Source Code No The paper references code provided by other authors (e.g., "the author s code2", "code3 provided by the author") but does not state that the code for *their own* methodology is open-source or provide a link for it.
Open Datasets Yes The proposed method is first evaluated on the goal-oriented environments in the D4RL benchmark [8]... Finally, DTAMP is demonstrated on the CALVIN benchmark [29]... We utilize the dataset provided by Mees et al. [29], which consists of the trajectories performing a variety of manipulation tasks...
Dataset Splits No The paper mentions training with random seeds and evaluating with rollouts but does not explicitly describe train/validation/test dataset splits, specific percentages, or sample counts for validation.
Hardware Specification Yes We utilize NVIDIA Geforce RTX 3060 Ti graphics card for training our models, taking approximately 14 hours per 1.0M training steps.
Software Dependencies No The paper mentions "Adam optimizer [23]" but does not specify version numbers for it or any other software components or libraries required to reproduce the experiments.
Experiment Setup Yes Encoder The encoder consists of two neural networks one for each actor and critic. Each neural network has two hidden fully-connected layers with a size of 512, and an output layer. ... We set dimension of goal space G to 16 for Antmaze environments, and 32 for Kitchen and CALVIN environments. We use the number of milestones K = 30 with maximum interval max = 32 for Antmaze environments, and K = 14 with maximum interval max = 16 for Kitchen and CALVIN environments. ... We use diffusion timestep N of 300, diffusion guidance coefficient β of 0.5, and target temporal distance target of 0.5 max for the all experiments. We use threshold δ of 0.1 to determine whether the agent has reached the targeted milestone.