Analysis of Deep Neural Networks with Extended Data Jacobian Matrix

Authors: Shengjie Wang, Abdel-rahman Mohamed, Rich Caruana, Jeff Bilmes, Matthai Plilipose, Matthew Richardson, Krzysztof Geras, Gregor Urban, Ozlem Aslan

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments in this paper are conducted on three different datasets: MNIST for hand-written digit recognition, CIFAR-10 for image recognition, and TIMIT for phone recognition.
Researcher Affiliation Collaboration Shengjie Wang WANGSJ@CS.WASHINGTON.EDU Abdel-rahman Mohamed ASAMIR@MICROSOFT.COM Rich Caruana RCARUANA@MICROSOFT.COM Jeff Bilmes BILMES@UW.EDU Matthai Plilipose MATTHAIP@MICROSOFT.COM Matthew Richardson MATTRI@MICROSOFT.COM Krzysztof Geras K.J.GERAS@SMS.ED.AC.UK Gregor Urban GURBAN@UCI.EDU Ozlem Aslan OZLEM@CS.UALBERTA.CA
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes Experiments in this paper are conducted on three different datasets: MNIST for hand-written digit recognition, CIFAR-10 for image recognition, and TIMIT for phone recognition.
Dataset Splits Yes MNIST consists of 60000 training data points, out of which we randomly extract 10000 data points as the validation set... Similar to MNIST, we extract 10000 out of the training dataset as a validation set. The TIMIT corpus consists of a 462 speaker training set, a 50 speaker validation set, and a 24 speaker test set.
Hardware Specification No The paper does not provide specific hardware details used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers.
Experiment Setup Yes We use stochastic gradient descent with momentum for training all the following reported models. Learning rates gets halved if the performance does not improve over a succession of 5 epochs on the validation set. No regularization/batch normalization is applied if not specified. The reported models are all selected by grid search for best performance to cover a broad range for each parameter in order to ensure a fair comparison between models.