Learning Latent Space Models with Angular Constraints
Authors: Pengtao Xie, Yuntian Deng, Yi Zhou, Abhimanu Kumar, Yaoliang Yu, James Zou, Eric P. Xing
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experiments, Table 1. Classification accuracy (%) on three datasets., Table 2. Phone error rate (%) on the TIMIT test set., Table 3. Classification error (%) on CIFAR-10 test set, Table 4. Accuracy (%) on the two QA datasets |
| Researcher Affiliation | Collaboration | 1Machine Learning Department, Carnegie Mellon University 2Petuum Inc. 3School of Engineering and Applied Sciences, Harvard University 4College of Engineering and Computer Science, Syracuse University 5Groupon Inc. 6School of Computer Science, University of Waterloo 7Department of Biomedical Data Science, Stanford University. |
| Pseudocode | No | The paper describes the algorithmic steps in text (e.g., 'Solve f W', 'Solve v(r) 1 , v(r) 2 ') but does not contain a structured pseudocode or algorithm block labeled as such. |
| Open Source Code | No | The paper does not provide any explicit statement about concrete access to source code for the methodology described, nor does it include a link to a repository. |
| Open Datasets | Yes | Scenes-15 (Lazebnik et al., 2006), Caltech256 (Griffin et al., 2007) and UIUC-Sport (Li & Fei-Fei, 2007). The TIMIT dataset... (https://catalog.ldc.upenn.edu/LDC93S1). The CIFAR-10 dataset... (https://www.cs.toronto.edu/~kriz/cifar.html). CNN and Daily Mail (Hermann et al., 2015). |
| Dataset Splits | Yes | We use 5-fold cross validation to tune τ in {0.3, 0.4, ..., 1} and the number of basis vectors in {50, 100, 200, ..., 500}. We used 5000 training images as the validation set to tune hyperparameters. |
| Hardware Specification | Yes | Table 5 shows the total runtime time of FNNs on TIMIT and CNNs on CIFAR-10 with a single GTX TITAN X GPU, and the runtime of LSTM networks on the CNN dataset with 2 TITAN X GPUs. |
| Software Dependencies | No | The paper mentions using the 'Kaldi (Povey et al., 2011) toolkit' and 'Ada Delta (Zeiler, 2012)' but does not provide specific version numbers for these or any other key software dependencies. |
| Experiment Setup | Yes | The number of hidden layers is 4. Each layer has 1024 hidden units. Stochastic gradient descent (SGD) is used to train the network. The learning rate is set to 0.008. ... depth is set to 28 and the width is set to 10. SGD is used for training, with epoch number 200, initial learning rate 0.1, minibatch size 128, Nesterov momentum 0.9, dropout probability 0.3 and weight decay 0.0005. The learning rate is dropped by 0.2 at 60, 120 and 160 epochs. ... the size of hidden state is set to 100. Optimization is based on Ada Delta (Zeiler, 2012), where the minibatch size and initial learning rate are set to 48 and 0.5. The model is trained for 8 epochs. Dropout (Srivastava et al., 2014) with probability 0.2 is applied. |