Towards Robust, Locally Linear Deep Networks

Authors: Guang-He Lee, David Alvarez-Melis, Tommi S. Jaakkola

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we examine our inference and learning algorithms with fully-connected (FC), residual (Res Net) (He et al., 2016), and recurrent (RNN) networks on image and time-series datasets with quantitative and qualitative experiments. In this section, we compare our approach ( ROLL ) with a baseline model with the same training procedure except the regularization ( vanilla ) in several scenarios. All the reported quantities are computed on a testing set.
Researcher Affiliation Academia Guang-He Lee, David Alvarez-Melis & Tommi S. Jaakkola Computer Science and Artificial Intelligence Lab MIT {guanghe,davidam,tommi}@csail.mit.edu
Pseudocode No The paper describes algorithmic procedures in narrative text (e.g., in Section 4.1 'PARALLEL COMPUTATION OF GRADIENTS' and Appendix C), but it does not include any clearly labeled pseudocode blocks or algorithm figures.
Open Source Code No The paper mentions a 'Project page: http://people.csail.mit.edu/guanghe/locally_linear.' but does not explicitly state that the source code for the methodology described in the paper is available there, nor does it provide a direct link to a code repository or indicate code in supplementary materials.
Open Datasets Yes We use a 55, 000/5, 000/10, 000 split of MNIST dataset for training/validation/testing. Experiments are conducted on a 4-layer FC model with Re LU activations. We train RNNs for speaker identification on a Japanese Vowel dataset from the UCI machine learning repository (Dheeru & Karra Taniskidou, 2017). We conduct experiments on Caltech-256 (Griffin et al., 2007).
Dataset Splits Yes We use a 55, 000/5, 000/10, 000 split of MNIST dataset for training/validation/testing. We randomly select 5 and 15 samples in each class as the validation and testing set, respectively, and put the remaining data into the training set.
Hardware Specification No The paper states 'Experiments are run on single GPU with 12G memory.' While it mentions the type of hardware and its memory, it does not provide specific details such as the GPU model (e.g., NVIDIA A100) or any CPU specifications, which are necessary for full reproducibility.
Software Dependencies No The paper mentions using 'Py Torch (Paszke et al., 2017)' for implementation but does not specify its version number. No other key software components are mentioned with version numbers required for reproducible setup.
Experiment Setup Yes For MNIST: 'The number of epochs is 20... We use stochastic gradient descent with Nesterov momentum. The learning rate is 0.01, the momentum is 0.5, and the batch size is 64.' For Japanese Vowel: 'We use AMSGrad optimizer... The learning rate is 0.001, and the batch size is 32 (sequences).' For Caltech-256: 'We train the model with stochastic gradient descent with Nesterov momentum for 20 epochs. The initial learning rate is 0.005, which is adjusted to 0.0005 after the first 10 epochs. The momentum is 0.5. The batch size is 32.'