Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MonoLift: Learning 3D Manipulation Policies from Monocular RGB via Distillation

Authors: Ziru Wang, Mengmeng Wang, Guang Dai, Yongliu Long, Jingdong Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on both simulated and real-world manipulation tasks show that Mono Lift not only outperforms existing monocular approaches but even surpasses several methods that rely on explicit 3D input, offering a resource-efficient and effective solution for vision-based robotic control. The video demonstration is available on our project page: https://robotasy.github.io/ Mono Lift/. We evaluate our method on several simulated benchmarks and real-world robotic manipulation tasks, covering a diverse range of challenges: (i) LIBERO-90 [40] for visually ambiguous tasks involving subtle structural differences; (ii) Meta-World [41] for fine-grained manipulation; and (iii) LIBERO-LONG [40] for long-horizon tasks.
Researcher Affiliation Collaboration Ziru Wang1, Mengmeng Wang 2,1, Guang Dai1, Yongliu Long3, Jingdong Wang4 1SGIT AI Lab, State Grid Corporation of China 2Zhejiang University of Technology 3Zhejiang University 4Baidu
Pseudocode No The paper describes the model architecture and distillation mechanism in Sections 3 and 4, and visualizes it in Figure 2, but does not include structured pseudocode or an algorithm block.
Open Source Code No Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: We plan to release them after proper organization and preparation.
Open Datasets Yes We evaluate our method on several simulated benchmarks and real-world robotic manipulation tasks, covering a diverse range of challenges: (i) LIBERO-90 [40] for visually ambiguous tasks involving subtle structural differences; (ii) Meta-World [41] for fine-grained manipulation; and (iii) LIBERO-LONG [40] for long-horizon tasks.
Dataset Splits No The paper states the number of expert demonstration trajectories used per task (e.g., '20 expert demonstration trajectories' for LIBERO-90 and Meta-World, '10 demonstration trajectories' for real-world tasks) for training. For evaluation, it mentions 'each task evaluated over 10 trials' for real-world experiments. However, it does not explicitly provide details on how these demonstration datasets are formally split into training, validation, or test sets in the traditional sense, but rather describes the total number of demonstrations used for training and then task evaluations.
Hardware Specification No The paper does not explicitly specify the type of GPU or other hardware used for the experiments in the main text or supplementary materials. The NeurIPS checklist indicates the GPU type is specified in Section 5, but this information is not found in the main body or directly verifiable in the provided text.
Software Dependencies No The paper mentions several models and architectures used (e.g., ResNet-18, MiniLM provided in [35], Depth Anything V2 [14], Transformer decoder), but it does not provide specific version numbers for programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup No While the paper describes model architecture details (e.g., historical window length H, number of Transformer blocks L) and loss functions, it does not explicitly provide concrete hyperparameters such as learning rates, batch sizes, number of epochs, or specific optimizer settings in the main body. The paper indicates that these details are provided in the supplementary material.