Learning Latent Space Models with Angular Constraints

Authors: Pengtao Xie, Yuntian Deng, Yi Zhou, Abhimanu Kumar, Yaoliang Yu, James Zou, Eric P. Xing

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experiments, Table 1. Classification accuracy (%) on three datasets., Table 2. Phone error rate (%) on the TIMIT test set., Table 3. Classification error (%) on CIFAR-10 test set, Table 4. Accuracy (%) on the two QA datasets
Researcher Affiliation Collaboration 1Machine Learning Department, Carnegie Mellon University 2Petuum Inc. 3School of Engineering and Applied Sciences, Harvard University 4College of Engineering and Computer Science, Syracuse University 5Groupon Inc. 6School of Computer Science, University of Waterloo 7Department of Biomedical Data Science, Stanford University.
Pseudocode No The paper describes the algorithmic steps in text (e.g., 'Solve f W', 'Solve v(r) 1 , v(r) 2 ') but does not contain a structured pseudocode or algorithm block labeled as such.
Open Source Code No The paper does not provide any explicit statement about concrete access to source code for the methodology described, nor does it include a link to a repository.
Open Datasets Yes Scenes-15 (Lazebnik et al., 2006), Caltech256 (Griffin et al., 2007) and UIUC-Sport (Li & Fei-Fei, 2007). The TIMIT dataset... (https://catalog.ldc.upenn.edu/LDC93S1). The CIFAR-10 dataset... (https://www.cs.toronto.edu/~kriz/cifar.html). CNN and Daily Mail (Hermann et al., 2015).
Dataset Splits Yes We use 5-fold cross validation to tune τ in {0.3, 0.4, ..., 1} and the number of basis vectors in {50, 100, 200, ..., 500}. We used 5000 training images as the validation set to tune hyperparameters.
Hardware Specification Yes Table 5 shows the total runtime time of FNNs on TIMIT and CNNs on CIFAR-10 with a single GTX TITAN X GPU, and the runtime of LSTM networks on the CNN dataset with 2 TITAN X GPUs.
Software Dependencies No The paper mentions using the 'Kaldi (Povey et al., 2011) toolkit' and 'Ada Delta (Zeiler, 2012)' but does not provide specific version numbers for these or any other key software dependencies.
Experiment Setup Yes The number of hidden layers is 4. Each layer has 1024 hidden units. Stochastic gradient descent (SGD) is used to train the network. The learning rate is set to 0.008. ... depth is set to 28 and the width is set to 10. SGD is used for training, with epoch number 200, initial learning rate 0.1, minibatch size 128, Nesterov momentum 0.9, dropout probability 0.3 and weight decay 0.0005. The learning rate is dropped by 0.2 at 60, 120 and 160 epochs. ... the size of hidden state is set to 100. Optimization is based on Ada Delta (Zeiler, 2012), where the minibatch size and initial learning rate are set to 48 and 0.5. The model is trained for 8 epochs. Dropout (Srivastava et al., 2014) with probability 0.2 is applied.