Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Depth from a Single Image by Harmonizing Overcomplete Local Network Predictions

Authors: Ayan Chakrabarti, Jingyu Shao, Greg Shakhnarovich

NeurIPS 2016 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on the NYUv2 depth data set [11], and find that it achieves state-of-the-art performance. We train and evaluate our method on the NYU v2 depth dataset [11]. To construct our training and validation sets, we adopt the standard practice of using the raw videos corresponding to the training images from the official train/test split. We randomly select 10% of these videos for validation, and use the rest for training our network. Table 2 reports the quantitative performance of our method, along with other state-of-the-art approaches over the entire test set, and we find that the proposed method yields superior performance on most metrics.
Researcher Affiliation Academia Ayan Chakrabarti TTI-Chicago Chicago, IL EMAIL Jingyu Shao Dept. of Statistics, UCLA Los Angeles, CA EMAIL Gregory Shakhnarovich TTI-Chicago Chicago, IL EMAIL
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The source code for implementation, along with a pre-trained network model, are available at http://www.ttic.edu/chakrabarti/mdepth.
Open Datasets Yes We train and evaluate our method on the NYU v2 depth dataset [11].
Dataset Splits Yes To construct our training and validation sets, we adopt the standard practice of using the raw videos corresponding to the training images from the official train/test split. We randomly select 10% of these videos for validation, and use the rest for training our network. Our training set is formed by sub-sampling video frames uniformly, and consists of roughly 56,000 color image-depth map pairs.
Hardware Specification Yes Our overall inference method (network predictions and globalization) takes 24 seconds per-image when using an NVIDIA Titan X GPU. AC and GS thank NVIDIA Corporation for donations of Titan X GPUs used in this research.
Software Dependencies No The paper mentions using the VGG-19 network and ReLU activations, but does not provide specific version numbers for programming languages, libraries, or other software dependencies.
Experiment Setup Yes We use a fully convolutional version of our architecture during training with a stride of 8 pixels, yielding nearly 4000 training patches per image. We train the network using SGD for a total of 14 epochs, using a batch size of only one image and a momentum value of 0.9. We begin with a learning rate of 0.01, and reduce it after the 4th, 8th, 10th, 12th, and 13th epochs, each time by a factor of two.