Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Towards Dynamic 3D Reconstruction of Hand-Instrument Interaction in Ophthalmic Surgery
Authors: Ming Hu, Zhengdi Yu, feilong tang, Kaiwen Chen, Yulong Li, Imran Razzak, Junjun He, Tolga Birdal, Kai-Jing Zhou, Zongyuan Ge
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we introduce Oph Net-3D, the first extensive RGB-D dynamic 3D reconstruction dataset for ophthalmic surgery, comprising 41 sequences from 40 surgeons and totaling 7.1 million frames, with fine-grained annotations of 12 surgical phases, 10 instrument categories, dense MANO hand meshes, and full 6-DoF instrument poses. ... Building upon Oph Net-3D, we establish two challenging benchmarks bimanual hand pose estimation and hand instrument interaction reconstruction and propose two dedicated architectures: H-Net for dual-hand mesh recovery and OH-Net for joint reconstruction of two-hand two-instrument interactions. These models leverage a novel spatial reasoning module with weak-perspective camera modeling and collision-aware center-based representation. Both architectures outperform existing methods by substantial margins, achieving improvements of over 2mm in Mean Per Joint Position Error (MPJPE) and up to 23% in ADD-S metrics for hand and instrument reconstruction, respectively. |
| Researcher Affiliation | Academia | 1Monash University 2Shanghai AI Laboratory 3MBZUAI 4Imperial College London 5Eye Hospital, Wenzhou Medical Univeristy EMAIL, EMAIL |
| Pseudocode | No | The paper describes the 'Automatic Annotation Method' in Section 3 and provides a diagram in Figure 2, but it does not include a structured pseudocode or algorithm block. |
| Open Source Code | No | Once the paper is accepted, we will open-source all data, code, and model weights. |
| Open Datasets | Yes | In this work, we introduce Oph Net-3D, the first extensive RGB-D dynamic 3D reconstruction dataset for ophthalmic surgery... EMAIL, EMAIL https://ophnet-3d.github.io/ ... Once the paper is accepted, we will open-source all data, code, and model weights. |
| Dataset Splits | Yes | Data Split. To ensure each phase has balanced samples, we split our dataset into training, validation, and test sets by subjects, which have 30, 3, 8 subjects separately. Based on the data split, bimanual hand pose estimation and (2) handinstrument interactions. We provide more details regarding the data distribution and data quality analysis in the Appendix. Note that in our experiments on both benchmarks, we train the model on the monocular training images from all 8 views, including both egocentric and allocentric for rich supervision. |
| Hardware Specification | Yes | On an NVIDIA A100 GPU the optimization pipeline can take 15 minutes for a video with 1000 frames... We train our network using 1 A100 GPU with batchsize of 64. |
| Software Dependencies | No | We implement the annotation pipelien with Py Torch [60]. |
| Experiment Setup | Yes | During the optimization of stage II and stage III (Sec. 3.2), we use L-BFGS algorithm with lr = 1 and optimizing the loss functions using below weights: For stage II, we have: λ2d = 0.001, λsmooth = 10, λθ = 0.04, λβ = 0.05. For stage III, we have: λz = 200, λϕ = 2, λγ = 10, λpen = 10, λβ = 0.05, λja = 1, λpalm = 1, λbl = 1. ... We train our network using 1 A100 GPU with batchsize of 64. The size of our backbone feature is 128 128 and the size of our 4 pixel-aligned output maps is 64 64. We applied random scale, rotation, flip, and colour jitter augmentation during training. ... We use the following weights in all experiments: λfocal = 80, λpj2d = 400, λ3d = 300, λsil = 50, λθ = 80, λβ = 10, λseg = 160. |