Trajectory Diffusion for ObjectGoal Navigation

Authors: Xinyao Yu, Sixian Zhang, Xinhang Song, Xiaorong Qin, Shuqiang Jiang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the Gibson and MP3D datasets demonstrate that the generated trajectories effectively guide the agent, resulting in more accurate and efficient navigation.
Researcher Affiliation Academia 1Key Lab of Intelligent Information Processing Laboratory of the Chinese Academy of Sciences (CAS), Institute of Computing Technology, Beijing, 2University of Chinese Academy of Sciences, Beijing 3 Institute of Intelligent Computing Technology, Suzhou, CAS
Pseudocode No The paper describes the implementation of T-Diff and its components (e.g., in Figure 2 and Section 4.2), but it does not include a dedicated pseudocode block or an algorithm section labeled as such.
Open Source Code Yes The code is available at https://github.com/sx-zhang/T-diff.git.
Open Datasets Yes We evaluate the performance of our model on standard Object Nav datasets, including Gibson [47] and Matterport3D (MP3D) [3] , in the Habitat simulator.
Dataset Splits Yes For Gibson, we use 25 train / 5 val scenes from the tiny-split, following the settings of [31], with 1000 validation episodes containing 6 target object categories. For MP3D, we utilize 56 train / 11 val scenes, with 2195 validation episodes containing 21 target object categories.
Hardware Specification No The paper discusses computational complexity in Section A.2 (FLOPs) but does not provide specific details on the hardware used for running the experiments, such as GPU/CPU models, memory, or specific cloud instances.
Software Dependencies No The paper mentions software components like DiT, ResNet-18, and AdamW optimizer but does not provide specific version numbers for these or other software dependencies required for replication.
Experiment Setup Yes For the training of trajectory diffusion model... The semantic maps are resized to 224 224. ... Training is performed using Adam W optimizer[19, 24] with a base learning rate of 1e-4, warmed up for 1000 steps using linear warmup and cosine schedule. After the warmup steps, the learning rate for the diffusion model is decayed by a factor of 1e-3, and the learning rate of the semantic map encoder is decayed by a factor of 1e-6. Each model is trained for 200 epochs. ... The maximum noise schedule τmax is set to 100. The length of the predicted trajectory k = 32 and the selected kg-th point is set to 28. ... The agent s turn angle is fixed at 30 degrees and each Forward step distance is 25 cm. The maximum timestep limit is set to 500 during navigation and t T diff is set to 5.