Randomized Nonlinear Component Analysis

Authors: David Lopez-Paz, Suvrit Sra, Alex Smola, Zoubin Ghahramani, Bernhard Schoelkopf

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate our algorithms through experiments on realworld data, on which we compare against the state-of-the-art. A simple R implementation of the presented algorithms is provided. (...) We demonstrate the effectiveness of the proposed randomized methods by experimenting with several real-world data and comparing against the state-of-the-art Deep Canonical Correlation Analysis (Andrew et al., 2013). (...) Section 5. Experiments: We investigate the performance of RCCA in multiple experiments with real-world data against state-of-the-art algorithms.
Researcher Affiliation Collaboration David Lopez-Paz DLOPEZ@TUE.MPG.DE Max-Planck-Institute for Intelligent Systems, University of Cambridge; Suvrit Sra SUVRIT@TUE.MPG.DE Max-Planck-Institute for Intelligent Systems, Carnegie Mellon University; Alexander J. Smola ALEX@SMOLA.ORG Carnegie Mellon University, Google Research; Zoubin Ghahramani ZOUBIN@ENG.CAM.AC.UK University of Cambridge; Bernhard Sch olkopf BS@TUE.MPG.DE Max-Planck-Institute for Intelligent Systems
Pseudocode No The paper describes the algorithms mathematically and textually but does not provide any structured pseudocode or algorithm blocks.
Open Source Code Yes Lastly, the presented methods are very simple to implement; we provide R source code at: http://lopezpaz.org/code/rca.r
Open Datasets Yes MNIST Handwritten Digits. (Le Cun & Cortes, 1998); X-Ray Microbeam Speech Data. (Westbury, 1994); Animals-with-Attributes dataset (Lampert et al., 2009); CIFAR-10 (mentioned in Figure 3). All are well-known and cited datasets.
Dataset Splits Yes MNIST: 54000 random samples are used for training, 10000 for testing and 6000 to cross-validate the parameters of (D)CCA. (...) XRMB: 30000 random samples are used for training, 10000 for testing and 10000 to cross-validate the parameters of (D)CCA. (...) We perform 14 random training/test partitions of 1000 samples each. (...) The cost parameter of the linear SVM is cross-validated on the grid [10 4, . . . , 104].
Hardware Specification Yes Table 1 caption states 'running times (minutes, single 1.8GHz core)'. This provides a specific speed for a processing unit used in the experiments.
Software Dependencies No The paper mentions an 'R implementation' but does not specify version numbers for R or any specific libraries or packages used.
Experiment Setup Yes Gaussian kernel widths {sx, sy} are set using the median heuristic. (...) CCA regularization is implicitly provided by the use of randomness (thus set to 10 8). (...) The number random projections was set to m = 2000. The number of latent dimensions was set to d = 20 for MNIST, and d = 40 (first row) or d = 100 (second row) for CIFAR10. (...) The cost parameter of the linear SVM is cross-validated on the grid [10 4, . . . , 104].