Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles
Authors: Stefan Lee, Senthil Purushwalkam Shiva Prakash, Michael Cogswell, Viresh Ranjan, David Crandall, Dhruv Batra
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach achieves lower oracle error compared to existing methods on a wide range of tasks and deep architectures. We also show qualitatively that the diverse solutions produced often provide interpretable representations of task ambiguity. |
| Researcher Affiliation | Academia | Stefan Lee Virginia Tech steflee@vt.edu Senthil Purushwalkam Carnegie Mellon University spurushw@andrew.cmu.edu Michael Cogswell Virginia Tech cogswell@vt.edu Viresh Ranjan Virginia Tech rviresh@vt.edu David Crandall Indiana University djcran@indiana.edu Dhruv Batra Virginia Tech dbatra@vt.edu |
| Pseudocode | Yes | Figure 2: The MCL approach of [8] (Alg. 1) requires costly retraining while our s MCL method (Alg. 2) works within standard SGD solvers, training all ensemble members under a joint loss. (Algorithms 1 and 2 are present in the paper) |
| Open Source Code | No | The paper mentions utilizing publicly available implementations of other models (e.g., neuraltalk2, Caffe) and describes s MCL as a layer to introduce, but it does not explicitly state that the authors' implementation code for s MCL is open-source or provide a link to it. |
| Open Datasets | Yes | We begin our experiments with s MCL on the CIFAR10 [17] dataset...We use the fully convolutional network (FCN) architecture presented by Long et al. [20]...We train on the Pascal VOC 2011 training set...We adopt the model and training procedure of Karpathy et al. [14], utilizing their publicly available implementation neuraltalk2. The model...We train and test on the MSCOCO dataset [18], using the same splits as [14]. |
| Dataset Splits | Yes | We train on the Pascal VOC 2011 training set augmented with extra segmentations provided in [10] and we test on a subset of the VOC 2011 validation set...We train and test on the MSCOCO dataset [18], using the same splits as [14]. |
| Hardware Specification | No | The acknowledgments mention 'NVIDIA GPU donation' and 'Computing resources used by this work are supported', but they do not specify exact GPU models, CPU details, or other specific hardware configurations used for the experiments. |
| Software Dependencies | No | The paper mentions using 'Caffe deep learning framework [13]' and the 'publicly available implementation neuraltalk2' but does not specify version numbers for these or any other software libraries or dependencies. |
| Experiment Setup | Yes | For these experiments, the reference model is trained using a batch size of 350 for 5,000 iterations with a momentum of 0.9, weight decay of 0.004, and an initial learning rate of 0.001 which drops to 0.0001 after 4000 iterations...We initialize our s MCL models from a standard ensemble trained for 50 epochs at a learning rate of 10 3. The s MCL ensemble is then fine-tuned for another 15 epochs at a reduced learning rate of 10 5...We train each ensemble for 70k iterations with the parameters of the CNN fixed. |