Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning from Demonstrations via Capability-Aware Goal Sampling

Authors: Yuanlin Duan, Yuning Wang, Wenjie Qiu, He Zhu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Cago across several sparse-reward environments and demonstrate substantial improvements in both sample efficiency and final task performance over existing imitation-based baselines. We evaluate Cago across a diverse set of challenging robotic manipulation environments to address the following research questions: (Q1) Does Cago outperform existing imitation learning baselines that leverage demonstrations in alternative ways? (Q2) Can Cago effectively realize capability-aware goal sampling that aligns with the agent s learning progress? (Q3) How essential are the proposed capability-aware goal sampling and BC-Explorer components to the overall performance of Cago?
Researcher Affiliation Academia Yuanlin Duan Rutgers University EMAIL Yuning Wang Rutgers University EMAIL Wenjie Qiu Rutgers University EMAIL He Zhu Rutgers University EMAIL
Pseudocode Yes Algorithm 1 Capability-Aware Goal Sampling (Cago) and Algorithm 2 The main training framework of Cago
Open Source Code Yes The code for Cago is available at https://github.com/RU-Automated-Reasoning-Group/Cago.
Open Datasets Yes For our experiments, we evaluate and compare Cago against several baselines across three robot environment suites with sparse rewards: Meta World (Yu et al., 2020), Adroit (Rajeswaran et al., 2017), and Maniskill (Gu et al., 2023; Tao et al., 2025).
Dataset Splits Yes During training, we used only 10 demonstration trajectories per task for the Meta World and Adroit environments, and 20 demonstration trajectories per task for the Mani Skill environments. Each method is evaluated on 100 held-out seeds, and we report the average success rate over these episodes. To evaluate the generalization capability, we tested Cago on 500 unseen initial states generated from random seeds, each differing from those in the demonstrations.
Hardware Specification Yes We clearly specifies the computer resources (8 Nvidia A100 GPU) and the amount of GPU memory required (approximately 2.4GB).
Software Dependencies No We adopt the default hyperparameters from the LEXA backbone model-based RL (MBRL) agent such as the learning rate, optimizer, and network architecture and maintain them consistently across all environments.
Experiment Setup Yes We adopt the default hyperparameters from the LEXA backbone model-based RL (MBRL) agent such as the learning rate, optimizer, and network architecture and maintain them consistently across all environments. The primary hyperparameter tuning for Cago focuses on the following aspects: (1) the episode length Lτ; (2) the proportion of Lτ allocated to the goal-directed phase Tgo; (3) the number of demonstrations Ndemo used for both dictionary construction and environment resetting; (4) the visit frequency threshold λvisit used in Algorithm 1 for filtering goal candidates; and (5) the similarity calculate metrics in Equation 2; (6) the similarity threshold ϵ in Equation 2. (Table 9: Hyperparameters of Cago)