Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Randomized Nonlinear Component Analysis

Authors: David Lopez-Paz, Suvrit Sra, Alex Smola, Zoubin Ghahramani, Bernhard Schoelkopf

ICML 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate our algorithms through experiments on realworld data, on which we compare against the state-of-the-art. A simple R implementation of the presented algorithms is provided. (...) We demonstrate the effectiveness of the proposed randomized methods by experimenting with several real-world data and comparing against the state-of-the-art Deep Canonical Correlation Analysis (Andrew et al., 2013). (...) Section 5. Experiments: We investigate the performance of RCCA in multiple experiments with real-world data against state-of-the-art algorithms.
Researcher Affiliation	Collaboration	David Lopez-Paz EMAIL Max-Planck-Institute for Intelligent Systems, University of Cambridge; Suvrit Sra EMAIL Max-Planck-Institute for Intelligent Systems, Carnegie Mellon University; Alexander J. Smola EMAIL Carnegie Mellon University, Google Research; Zoubin Ghahramani EMAIL University of Cambridge; Bernhard Sch olkopf EMAIL Max-Planck-Institute for Intelligent Systems
Pseudocode	No	The paper describes the algorithms mathematically and textually but does not provide any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Lastly, the presented methods are very simple to implement; we provide R source code at: http://lopezpaz.org/code/rca.r
Open Datasets	Yes	MNIST Handwritten Digits. (Le Cun & Cortes, 1998); X-Ray Microbeam Speech Data. (Westbury, 1994); Animals-with-Attributes dataset (Lampert et al., 2009); CIFAR-10 (mentioned in Figure 3). All are well-known and cited datasets.
Dataset Splits	Yes	MNIST: 54000 random samples are used for training, 10000 for testing and 6000 to cross-validate the parameters of (D)CCA. (...) XRMB: 30000 random samples are used for training, 10000 for testing and 10000 to cross-validate the parameters of (D)CCA. (...) We perform 14 random training/test partitions of 1000 samples each. (...) The cost parameter of the linear SVM is cross-validated on the grid [10 4, . . . , 104].
Hardware Specification	Yes	Table 1 caption states 'running times (minutes, single 1.8GHz core)'. This provides a specific speed for a processing unit used in the experiments.
Software Dependencies	No	The paper mentions an 'R implementation' but does not specify version numbers for R or any specific libraries or packages used.
Experiment Setup	Yes	Gaussian kernel widths {sx, sy} are set using the median heuristic. (...) CCA regularization is implicitly provided by the use of randomness (thus set to 10 8). (...) The number random projections was set to m = 2000. The number of latent dimensions was set to d = 20 for MNIST, and d = 40 (ﬁrst row) or d = 100 (second row) for CIFAR10. (...) The cost parameter of the linear SVM is cross-validated on the grid [10 4, . . . , 104].