Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DualFocus: Depth from Focus with Spatio-Focal Dual Variational Constraints

Authors: Sungmin Woo, Sangyoun Lee

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments on four public datasets demonstrate that Dual Focus consistently outperforms state-of-the-art methods in both depth accuracy and perceptual quality. Extensive experiments on four public datasets, including NYU Depth v2 [25], Fo D500 [15], DDFF 12-Scene [9], and ARKit Scenes [2], demonstrate that Dual Focus surpasses state-of-the-art methods in both depth accuracy and perceptual quality. 4 Experiments
Researcher Affiliation Academia Sungmin Woo Yonsei University EMAIL Sangyoun Lee Yonsei University EMAIL
Pseudocode No The paper describes the methodology using prose and mathematical formulations in sections like "3.3 Spatial Variational Constraints" and "3.4 Focal Variational Constraints", but it does not include a distinct block labeled "Pseudocode" or "Algorithm".
Open Source Code No We plan to release the code and pretrained models in the future, but they are not available at the time of camera-ready submission.
Open Datasets Yes Extensive experiments on four public datasets, including NYU Depth v2 [25], Fo D500 [15], DDFF 12-Scene [9], and ARKit Scenes [2], demonstrate that Dual Focus surpasses state-of-the-art methods in both depth accuracy and perceptual quality.
Dataset Splits Yes For Fo D500 and DDFF 12-Scene datasets, we follow the training protocol used in DFV [29]. (1) NYU Depth v2 [25] is a comprehensive indoor dataset with over 24,000 RGB-depth pairs for training and 654 for testing. [...] (2) Fo D500 [15] is a synthetic dataset originally designed for DFD, featuring 400 training and 100 test samples. Each includes a 5-frame focal stack and a ground truth depth map. The image resolution is 256 256, which is randomly cropped into 224 224. (3) DDFF 12-Scene [9] [...] We adopt the split from DFV [29], using six scenes for training and validation (e.g., kitchen, seminaroom) and six for testing (e.g., cafeteria, library). Each sample provides a 10-frame focal stack, though we use randomly selected 5 frames for consistency. Training uses 224 224 random crops and flips, while evaluation is performed at the original resolution of 383 552, consistent with prior works [29, 6, 9]. (4) ARKit Scenes [2] is a large-scale mobile AR dataset. We use a subset of 5,600 images for zero-shot evaluation to assess the model s ability to generalize to unseen real-world environments without fine-tuning.
Hardware Specification Yes We train our model on two Titan RTX GPUs using Py Torch. All measurements were conducted on a single NVIDIA RTX A6000 GPU.
Software Dependencies No We train our model on two Titan RTX GPUs using Py Torch. The encoder is based on a Res Net-18 FPN [13] and the decoder employs 3D-Res Net blocks [8]. For optimization, we use the Adam optimizer (β1 = 0.9, β2 = 0.999). [...] to solve the regularized normal equation in closed form using torch.linalg.solve as in [14].
Experiment Setup Yes We train our model on two Titan RTX GPUs using Py Torch. The encoder is based on a Res Net-18 FPN [13] and the decoder employs 3D-Res Net blocks [8]. For optimization, we use the Adam optimizer (β1 = 0.9, β2 = 0.999) with an initial learning rate of 1 10 4, which is reduced to 1 10 5 via a cosine annealing scheduler. The model is trained for 40 epochs on the NYU Depth v2 dataset with a batch size of 16, and for 2000 epochs on the Fo D500 and DDFF 12-Scene datasets with a batch size of 20. Table 6: Model and training hyperparmeters. Hyperparameter Value Focal stack size N 5 Encoder Resnet-18 FPN [13] Decoder 3D-Res Net blocks [8] Feature channel C1 32 Feature channel C2 16 λsv 20 λfv 100 Optimizer Adam (β1 = 0.9,β2 = 0.999) Scheduler Cosine annealing Initial learning rate 0.001 Batch size 16 / 20 (NYU dataset / Others) Training epochs 40 / 2000 (NYU dataset / Others)