Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

C-NAV: Towards Self-Evolving Continual Object Navigation in Open World

Authors: MingMing Yu, Fei Zhu, Wenzhuo Liu, Yirong Yang, Qunbo Wang, wenjun wu, Jing Liu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments across multiple model architectures demonstrate that C-Nav consistently outperforms existing approaches, achieving superior performance even compared to baselines with full trajectory retention, while significantly lowering memory requirements.
Researcher Affiliation Academia 1Beihang University, 2Centre for Artificial Intelligence and Robotics, HKISI-CAS 3Institute of Automation, Chinese Academy of Sciences 4University of Chinese Academy of Sciences 5Hangzhou International Innovation Institute, Beihang University, 6Beijing Jiaotong University
Pseudocode Yes The pseudocode for C-Nav is illustrated in Algorithm 1.
Open Source Code No The code will be available at https://bigtree765.github.io/C-Nav-project.
Open Datasets Yes We adopt two widely used object goal navigation datasets: Object Nav (HM3D) consists of 2,000 episodes sampled from 20 validation scenes in the HM3D dataset, covering 6 object categories. Object Nav (MP3D), introduced in the Habitat 2020 Challenge, contains 2,195 episodes from 11 MP3D validation scenes, spanning 21 object categories.
Dataset Splits Yes To adapt to the continual Object Navigation setting, we divide the object categories and corresponding trajectories into four incremental learning stages. The detailed splits are shown in Table 1, and a full list of object category assignments per stage can be found in the supplementary materials.
Hardware Specification Yes All experiments are conducted using two NVIDIA A6000 GPUs.
Software Dependencies No We implement our multi-modal encoder using CLIP-ResNet50 [11] for visual encoding and a Point Nav-pretrained ResNet-50 [55] for depth encoding. Following previous work [23, 9], we keep both encoders frozen during training. The feature fusion is implemented by the feature concatenation. We train our model using AdamW optimization with a linear warmup over 1,000 steps to reach our initial learning rate of 3e-4, followed by linear decay. We train each task stage for 25 epochs with a batch size of 32. For action prediction, the RNN-based model employs a 2-layer LSTM [56] architecture. The transformer-based and Bev-based [9] decoders utilize a 4-layer transformer [57] as the action decoder with a dropout rate of 0.1. For the LLM-based approach, we adopt Qwen2-0.5B [37] as the action decoder, incorporating six special tokens to represent atomic actions.
Experiment Setup Yes We train our model using AdamW optimization with a linear warmup over 1,000 steps to reach our initial learning rate of 3e-4, followed by linear decay. We train each task stage for 25 epochs with a batch size of 32. For action prediction, the RNN-based model employs a 2-layer LSTM [56] architecture. The transformer-based and Bev-based [9] decoders utilize a 4-layer transformer [57] as the action decoder with a dropout rate of 0.1... We set the inflection weight γ to 3.48 and configure our loss balance weights λKD and λFP to 5.