Towards Robust, Locally Linear Deep Networks
Authors: Guang-He Lee, David Alvarez-Melis, Tommi S. Jaakkola
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we examine our inference and learning algorithms with fully-connected (FC), residual (Res Net) (He et al., 2016), and recurrent (RNN) networks on image and time-series datasets with quantitative and qualitative experiments. In this section, we compare our approach ( ROLL ) with a baseline model with the same training procedure except the regularization ( vanilla ) in several scenarios. All the reported quantities are computed on a testing set. |
| Researcher Affiliation | Academia | Guang-He Lee, David Alvarez-Melis & Tommi S. Jaakkola Computer Science and Artificial Intelligence Lab MIT {guanghe,davidam,tommi}@csail.mit.edu |
| Pseudocode | No | The paper describes algorithmic procedures in narrative text (e.g., in Section 4.1 'PARALLEL COMPUTATION OF GRADIENTS' and Appendix C), but it does not include any clearly labeled pseudocode blocks or algorithm figures. |
| Open Source Code | No | The paper mentions a 'Project page: http://people.csail.mit.edu/guanghe/locally_linear.' but does not explicitly state that the source code for the methodology described in the paper is available there, nor does it provide a direct link to a code repository or indicate code in supplementary materials. |
| Open Datasets | Yes | We use a 55, 000/5, 000/10, 000 split of MNIST dataset for training/validation/testing. Experiments are conducted on a 4-layer FC model with Re LU activations. We train RNNs for speaker identification on a Japanese Vowel dataset from the UCI machine learning repository (Dheeru & Karra Taniskidou, 2017). We conduct experiments on Caltech-256 (Griffin et al., 2007). |
| Dataset Splits | Yes | We use a 55, 000/5, 000/10, 000 split of MNIST dataset for training/validation/testing. We randomly select 5 and 15 samples in each class as the validation and testing set, respectively, and put the remaining data into the training set. |
| Hardware Specification | No | The paper states 'Experiments are run on single GPU with 12G memory.' While it mentions the type of hardware and its memory, it does not provide specific details such as the GPU model (e.g., NVIDIA A100) or any CPU specifications, which are necessary for full reproducibility. |
| Software Dependencies | No | The paper mentions using 'Py Torch (Paszke et al., 2017)' for implementation but does not specify its version number. No other key software components are mentioned with version numbers required for reproducible setup. |
| Experiment Setup | Yes | For MNIST: 'The number of epochs is 20... We use stochastic gradient descent with Nesterov momentum. The learning rate is 0.01, the momentum is 0.5, and the batch size is 64.' For Japanese Vowel: 'We use AMSGrad optimizer... The learning rate is 0.001, and the batch size is 32 (sequences).' For Caltech-256: 'We train the model with stochastic gradient descent with Nesterov momentum for 20 epochs. The initial learning rate is 0.005, which is adjusted to 0.0005 after the first 10 epochs. The momentum is 0.5. The batch size is 32.' |