Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

$\textit{Hyper-GoalNet}$: Goal-Conditioned Manipulation Policy Learning with HyperNetworks

Authors: Pei Zhou, Wanting Yao, Qian Luo, Xunzhe Zhou, Yanchao Yang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on a comprehensive suite of manipulation tasks with varying environmental randomization. Results demonstrate significant performance improvements over state-of-the-art methods, particularly in high-variability conditions. Real-world robotic experiments further validate our method s robustness to sensor noise and physical uncertainties.
Researcher Affiliation	Academia	1Info Bodied AI Lab, The University of Hong Kong 2University of Pennsylvania EMAIL EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Hyper-Goal Net: Test-Time Task Evaluation
Open Source Code	Yes	Code is available at: https://github.com/wantingyao/hyper-goalnet.
Open Datasets	Yes	We evaluate our approach using Robosuite, a comprehensive robotics benchmark designed for both short and long-horizon manipulation tasks [31, 58]. This framework provides a standardized suite of environments, from which we select multiple contact-rich tabletop manipulation tasks: coffee manipulation, threading, mug cleanup, nut assembly, three-piece assembly, and several long-horizon tasks including coffee preparation and kitchen manipulation. Our approach follows the behavior cloning paradigm, utilizing a dataset based on Mimic Gen [31].
Dataset Splits	Yes	Using a fixed random seed, we partition the dataset into 950 training and 50 validation demonstrations across all tasks.
Hardware Specification	Yes	Table 10 presents the average inference latency per step across different methods, measured over 40,000 steps on a single NVIDIA RTX 3090 GPU.
Software Dependencies	No	The training procedure employs the Adam optimizer [26] with a cosine learning rate schedule [28].
Experiment Setup	Yes	The training procedure employs the Adam optimizer [26] with a cosine learning rate schedule [28]. We initialize the learning rate at 5 10 4 and maintain uniform loss balancing coefficients (λi = 1 for all components). Our model is trained for 500 epochs with a batch size of 256, by default.