Diffused Task-Agnostic Milestone Planner
Authors: Mineui Hong, Minjae Kang, Songhwai Oh
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the proposed method across offline reinforcement learning (RL) benchmarks and an visual manipulation environment. The results show that our approach outperforms offline RL methods in solving long-horizon, sparse-reward tasks and multi-task problems, while also achieving the state-of-the-art performance on the most challenging vision-based manipulation benchmark. ... 4 Experiments In this section we empirically verify that DTAMP is able to handle long-horizon, multi-task decision-making problems. |
| Researcher Affiliation | Academia | Mineui Hong, Minjae Kang, and Songhwai Oh Department of Electrical and Computer Engineering and ASRI Seoul National University mineui.hong@rllab.snu.ac.kr, minjae.kang@rllab.snu.ac.kr, songhwai@snu.ac.kr |
| Pseudocode | Yes | Algorithm 1: Sequential decision-making using DTAMP |
| Open Source Code | No | The paper references code provided by other authors (e.g., "the author s code2", "code3 provided by the author") but does not state that the code for *their own* methodology is open-source or provide a link for it. |
| Open Datasets | Yes | The proposed method is first evaluated on the goal-oriented environments in the D4RL benchmark [8]... Finally, DTAMP is demonstrated on the CALVIN benchmark [29]... We utilize the dataset provided by Mees et al. [29], which consists of the trajectories performing a variety of manipulation tasks... |
| Dataset Splits | No | The paper mentions training with random seeds and evaluating with rollouts but does not explicitly describe train/validation/test dataset splits, specific percentages, or sample counts for validation. |
| Hardware Specification | Yes | We utilize NVIDIA Geforce RTX 3060 Ti graphics card for training our models, taking approximately 14 hours per 1.0M training steps. |
| Software Dependencies | No | The paper mentions "Adam optimizer [23]" but does not specify version numbers for it or any other software components or libraries required to reproduce the experiments. |
| Experiment Setup | Yes | Encoder The encoder consists of two neural networks one for each actor and critic. Each neural network has two hidden fully-connected layers with a size of 512, and an output layer. ... We set dimension of goal space G to 16 for Antmaze environments, and 32 for Kitchen and CALVIN environments. We use the number of milestones K = 30 with maximum interval max = 32 for Antmaze environments, and K = 14 with maximum interval max = 16 for Kitchen and CALVIN environments. ... We use diffusion timestep N of 300, diffusion guidance coefficient β of 0.5, and target temporal distance target of 0.5 max for the all experiments. We use threshold δ of 0.1 to determine whether the agent has reached the targeted milestone. |