Analysis of Deep Neural Networks with Extended Data Jacobian Matrix
Authors: Shengjie Wang, Abdel-rahman Mohamed, Rich Caruana, Jeff Bilmes, Matthai Plilipose, Matthew Richardson, Krzysztof Geras, Gregor Urban, Ozlem Aslan
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments in this paper are conducted on three different datasets: MNIST for hand-written digit recognition, CIFAR-10 for image recognition, and TIMIT for phone recognition. |
| Researcher Affiliation | Collaboration | Shengjie Wang WANGSJ@CS.WASHINGTON.EDU Abdel-rahman Mohamed ASAMIR@MICROSOFT.COM Rich Caruana RCARUANA@MICROSOFT.COM Jeff Bilmes BILMES@UW.EDU Matthai Plilipose MATTHAIP@MICROSOFT.COM Matthew Richardson MATTRI@MICROSOFT.COM Krzysztof Geras K.J.GERAS@SMS.ED.AC.UK Gregor Urban GURBAN@UCI.EDU Ozlem Aslan OZLEM@CS.UALBERTA.CA |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | Experiments in this paper are conducted on three different datasets: MNIST for hand-written digit recognition, CIFAR-10 for image recognition, and TIMIT for phone recognition. |
| Dataset Splits | Yes | MNIST consists of 60000 training data points, out of which we randomly extract 10000 data points as the validation set... Similar to MNIST, we extract 10000 out of the training dataset as a validation set. The TIMIT corpus consists of a 462 speaker training set, a 50 speaker validation set, and a 24 speaker test set. |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. |
| Experiment Setup | Yes | We use stochastic gradient descent with momentum for training all the following reported models. Learning rates gets halved if the performance does not improve over a succession of 5 epochs on the validation set. No regularization/batch normalization is applied if not speciļ¬ed. The reported models are all selected by grid search for best performance to cover a broad range for each parameter in order to ensure a fair comparison between models. |