Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
MO-DDN: A Coarse-to-Fine Attribute-based Exploration Agent for Multi-Object Demand-driven Navigation
Authors: Hongcheng Wang, Peiqi Liu, Wenzhe Cai, Mingdong Wu, Zhengyu Qian, Hao Dong
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results illustrate that this coarse-to-fine exploration strategy capitalizes on the advantages of attributes at various decision-making levels, resulting in superior performance compared to baseline methods. Code and video can be found at https://sites.google.com/view/moddn. 5 Experiment 5.1 Experimental Settings 5.2 Baselines 5.3 Baseline Comparison 5.4 Ablation Study |
| Researcher Affiliation | Academia | Hongcheng Wang1,3 Peiqi Liu2 Wenzhe Cai4 Mingdong Wu 1,3 Zhengyu Qian 2 Hao Dong 1,3 1CFCS, School of CS, PKU 2School of EECS, PKU 3PKU-Agibot Lab 4School of Automation, Southeast University |
| Pseudocode | Yes | Algorithm 1: Losses in Attribute Training |
| Open Source Code | Yes | Code and video can be found at https://sites.google.com/view/moddn. |
| Open Datasets | Yes | We generate 300 tasks, encompassing 358 object categories from the HSSD dataset [99]. |
| Dataset Splits | Yes | HSSD splits the scenes into val scenes and train scenes (i.e., unseen scenes and seen scenes in Tab. 5.3, respectively). |
| Hardware Specification | Yes | A single RTX 4090 is enough to run the experiments. Our method and baselines can be trained on a single RTX 4090, which will take about one day for each method. |
| Software Dependencies | Yes | We use the standard transformer encoder from the official Py Torch 1.13.1 implementation |
| Experiment Setup | Yes | We use the standard transformer encoder from the official Py Torch 1.13.1 implementation, where d_model is 768, nhead is 8, num_layers is 6, and other parameters remain default. The embedding dim of action is 64. The embedding dim of GPS+Compass is 32. The input dim of LSTM is 768+64+32, its hidden_size is 1024, and its num_layers is 2. The depth model is a simple five-layer CNN model and a two-layer MLP model. Loss = λ1 Attribte Loss + λ2 Matching Loss+ λ3 V Q Loss + λ4 Commit Loss + λ5 Recon Loss where λ1 is 2.0, λ2 is 1.0, λ3 is 1.0, λ4 is 0.25, and λ5 is 1.0. We trained the model on a single RTX 4090 using imitation learning and cross-entropy loss, i.e., considering the action prediction as a classification task, consuming about 12h. |