Diversity Networks

Authors: Zelda Mariet, Suvrit Sra

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present experimental results to corroborate our claims: for pruning neural networks, Divnet is seen to be notably superior to competing approaches.
Researcher Affiliation Academia Zelda Mariet and Suvrit Sra Massachusetts Institute of Technology Cambridge, MA 02139, USA zelda@csail.mit.edu,suvrit@mit.edu
Pseudocode No The paper describes the methodology in prose and mathematical equations but does not present a formal pseudocode or algorithm block.
Open Source Code No The paper states it was 'Run in MATLAB, based on the code from Deep Learn Tool Box (https://github.com/ rasmusbergpalm/Deep Learn Toolbox) and Alex Kulesza’s code for DPPs (http://web.eecs.umich. edu/~kulesza/)' which indicates use of third-party code, but there is no explicit statement or link indicating that the authors' own code for Divnet is being released.
Open Datasets Yes To quantify the performance of our algorithm, we present below the results of experiments1 on common datasets for neural network evaluation: MNIST (Le Cun and Cortes, 2010), MNIST ROT (Larochelle et al., 2007) and CIFAR-10 (Krizhevsky, 2009).
Dataset Splits No The paper mentions using training and test data but does not explicitly define a validation split or specific percentages for data partitioning to enable reproducibility of the data splits.
Hardware Specification Yes Run in MATLAB, based on the code from Deep Learn Tool Box... on a Linux Mint system with 16GB of RAM and an i7-4710HQ CPU @ 2.50GHz.
Software Dependencies No The paper mentions 'MATLAB' and third-party code libraries ('Deep Learn Tool Box', 'Alex Kulesza’s code for DPPs') but does not specify version numbers for these software components, making replication difficult.
Experiment Setup Yes All networks were trained up until a certain training error threshold, using softmax activation on the output layer and sigmoids on other layers; see Table 1 for more details. Table 1: Overview of the sets of networks used in the experiments. We train each class of networks until the first iteration of backprop for which the training error reaches a predefined threshold. Dataset Instances Trained up until Architecture MNIST 5 < 1% error 784 500 500 10 MNIST ROT 5 < 1% error 784 500 500 10 CIFAR-10 5 < 50% error 3072 1000 1000 1000 10. To ensure strict positive definiteness of the kernel matrix L , we add a small diagonal perturbation εI to L (ε = 0.01). ...we use the fixed choice β = 10/|T |, which was experimentally seen to work well.