Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Diffused Task-Agnostic Milestone Planner
Authors: Mineui Hong, Minjae Kang, Songhwai Oh
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the proposed method across offline reinforcement learning (RL) benchmarks and an visual manipulation environment. The results show that our approach outperforms offline RL methods in solving long-horizon, sparse-reward tasks and multi-task problems, while also achieving the state-of-the-art performance on the most challenging vision-based manipulation benchmark. ... 4 Experiments In this section we empirically verify that DTAMP is able to handle long-horizon, multi-task decision-making problems. |
| Researcher Affiliation | Academia | Mineui Hong, Minjae Kang, and Songhwai Oh Department of Electrical and Computer Engineering and ASRI Seoul National University EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Sequential decision-making using DTAMP |
| Open Source Code | No | The paper references code provided by other authors (e.g., "the author s code2", "code3 provided by the author") but does not state that the code for *their own* methodology is open-source or provide a link for it. |
| Open Datasets | Yes | The proposed method is first evaluated on the goal-oriented environments in the D4RL benchmark [8]... Finally, DTAMP is demonstrated on the CALVIN benchmark [29]... We utilize the dataset provided by Mees et al. [29], which consists of the trajectories performing a variety of manipulation tasks... |
| Dataset Splits | No | The paper mentions training with random seeds and evaluating with rollouts but does not explicitly describe train/validation/test dataset splits, specific percentages, or sample counts for validation. |
| Hardware Specification | Yes | We utilize NVIDIA Geforce RTX 3060 Ti graphics card for training our models, taking approximately 14 hours per 1.0M training steps. |
| Software Dependencies | No | The paper mentions "Adam optimizer [23]" but does not specify version numbers for it or any other software components or libraries required to reproduce the experiments. |
| Experiment Setup | Yes | Encoder The encoder consists of two neural networks one for each actor and critic. Each neural network has two hidden fully-connected layers with a size of 512, and an output layer. ... We set dimension of goal space G to 16 for Antmaze environments, and 32 for Kitchen and CALVIN environments. We use the number of milestones K = 30 with maximum interval max = 32 for Antmaze environments, and K = 14 with maximum interval max = 16 for Kitchen and CALVIN environments. ... We use diffusion timestep N of 300, diffusion guidance coefficient β of 0.5, and target temporal distance target of 0.5 max for the all experiments. We use threshold δ of 0.1 to determine whether the agent has reached the targeted milestone. |