DISCO Nets : DISsimilarity COefficients Networks

Authors: Diane Bouchacourt, Pawan K. Mudigonda, Sebastian Nowozin

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically show that (i) by modeling uncertainty on the output value, DISCO Nets outperform equivalent non-probabilistic predictive networks and (ii) DISCO Nets accurately model the uncertainty of the output, outperforming existing probabilistic models based on deep neural networks. ... We use the NYU Hand Pose dataset of Tompson et al. [27] to estimate the efficiency of DISCO Nets for this task.
Researcher Affiliation Collaboration Diane Bouchacourt University of Oxford diane@robots.ox.ac.uk M. Pawan Kumar University of Oxford pawan@robots.ox.ac.uk Sebastian Nowozin Microsoft Research Cambridge sebastian.nowozin@microsoft.com
Pseudocode Yes Algorithm 1: DISCO Nets Training algorithm.
Open Source Code No The paper does not provide an explicit statement about releasing source code for the described methodology, nor does it include a specific link to a code repository.
Open Datasets Yes We use the NYU Hand Pose dataset of Tompson et al. [27] to estimate the efficiency of DISCO Nets for this task.
Dataset Splits Yes We use 10,000 examples from the 72,757 training frames to construct a validation dataset and train only on 62,757 examples.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments. It generally discusses neural networks but does not specify the computational environment.
Software Dependencies No The paper describes the algorithms and optimization methods used (e.g., Stochastic Gradient Descent, L2-regularisation), but it does not specify software library names with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x) that would be needed for replication.
Experiment Setup Yes Back-propagation is used with Stochastic Gradient Descent with a batchsize of 256. The learning rate is fixed to λ = 0.01 and we use a momentum of m = 0.9 (see Polyak [20]). We also add L2-regularisation controlled by the parameter C. We use C = [0.0001, 0.001, 0.01] which is a relevant range as the comparative model BASEβ is best performing for C = 0.001. ... We train all models for 400 epochs...