Randomized Nonlinear Component Analysis
Authors: David Lopez-Paz, Suvrit Sra, Alex Smola, Zoubin Ghahramani, Bernhard Schoelkopf
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate our algorithms through experiments on realworld data, on which we compare against the state-of-the-art. A simple R implementation of the presented algorithms is provided. (...) We demonstrate the effectiveness of the proposed randomized methods by experimenting with several real-world data and comparing against the state-of-the-art Deep Canonical Correlation Analysis (Andrew et al., 2013). (...) Section 5. Experiments: We investigate the performance of RCCA in multiple experiments with real-world data against state-of-the-art algorithms. |
| Researcher Affiliation | Collaboration | David Lopez-Paz DLOPEZ@TUE.MPG.DE Max-Planck-Institute for Intelligent Systems, University of Cambridge; Suvrit Sra SUVRIT@TUE.MPG.DE Max-Planck-Institute for Intelligent Systems, Carnegie Mellon University; Alexander J. Smola ALEX@SMOLA.ORG Carnegie Mellon University, Google Research; Zoubin Ghahramani ZOUBIN@ENG.CAM.AC.UK University of Cambridge; Bernhard Sch olkopf BS@TUE.MPG.DE Max-Planck-Institute for Intelligent Systems |
| Pseudocode | No | The paper describes the algorithms mathematically and textually but does not provide any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Lastly, the presented methods are very simple to implement; we provide R source code at: http://lopezpaz.org/code/rca.r |
| Open Datasets | Yes | MNIST Handwritten Digits. (Le Cun & Cortes, 1998); X-Ray Microbeam Speech Data. (Westbury, 1994); Animals-with-Attributes dataset (Lampert et al., 2009); CIFAR-10 (mentioned in Figure 3). All are well-known and cited datasets. |
| Dataset Splits | Yes | MNIST: 54000 random samples are used for training, 10000 for testing and 6000 to cross-validate the parameters of (D)CCA. (...) XRMB: 30000 random samples are used for training, 10000 for testing and 10000 to cross-validate the parameters of (D)CCA. (...) We perform 14 random training/test partitions of 1000 samples each. (...) The cost parameter of the linear SVM is cross-validated on the grid [10 4, . . . , 104]. |
| Hardware Specification | Yes | Table 1 caption states 'running times (minutes, single 1.8GHz core)'. This provides a specific speed for a processing unit used in the experiments. |
| Software Dependencies | No | The paper mentions an 'R implementation' but does not specify version numbers for R or any specific libraries or packages used. |
| Experiment Setup | Yes | Gaussian kernel widths {sx, sy} are set using the median heuristic. (...) CCA regularization is implicitly provided by the use of randomness (thus set to 10 8). (...) The number random projections was set to m = 2000. The number of latent dimensions was set to d = 20 for MNIST, and d = 40 (first row) or d = 100 (second row) for CIFAR10. (...) The cost parameter of the linear SVM is cross-validated on the grid [10 4, . . . , 104]. |