Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

$\textit{Hyper-GoalNet}$: Goal-Conditioned Manipulation Policy Learning with HyperNetworks

Authors: Pei Zhou, Wanting Yao, Qian Luo, Xunzhe Zhou, Yanchao Yang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on a comprehensive suite of manipulation tasks with varying environmental randomization. Results demonstrate significant performance improvements over state-of-the-art methods, particularly in high-variability conditions. Real-world robotic experiments further validate our method s robustness to sensor noise and physical uncertainties.
Researcher Affiliation Academia 1Info Bodied AI Lab, The University of Hong Kong 2University of Pennsylvania EMAIL EMAIL, EMAIL
Pseudocode Yes Algorithm 1 Hyper-Goal Net: Test-Time Task Evaluation
Open Source Code Yes Code is available at: https://github.com/wantingyao/hyper-goalnet.
Open Datasets Yes We evaluate our approach using Robosuite, a comprehensive robotics benchmark designed for both short and long-horizon manipulation tasks [31, 58]. This framework provides a standardized suite of environments, from which we select multiple contact-rich tabletop manipulation tasks: coffee manipulation, threading, mug cleanup, nut assembly, three-piece assembly, and several long-horizon tasks including coffee preparation and kitchen manipulation. Our approach follows the behavior cloning paradigm, utilizing a dataset based on Mimic Gen [31].
Dataset Splits Yes Using a fixed random seed, we partition the dataset into 950 training and 50 validation demonstrations across all tasks.
Hardware Specification Yes Table 10 presents the average inference latency per step across different methods, measured over 40,000 steps on a single NVIDIA RTX 3090 GPU.
Software Dependencies No The training procedure employs the Adam optimizer [26] with a cosine learning rate schedule [28].
Experiment Setup Yes The training procedure employs the Adam optimizer [26] with a cosine learning rate schedule [28]. We initialize the learning rate at 5 10 4 and maintain uniform loss balancing coefficients (Îģi = 1 for all components). Our model is trained for 500 epochs with a batch size of 256, by default.