Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Nonparametric Estimation of Multi-View Latent Variable Models

Authors: Le Song, Animashree Anandkumar, Bo Dai, Bo Xie

ICML 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In both synthetic and real world datasets, the nonparametric tensor power method compares favorably to EM algorithm and other spectral algorithms. Experimentally, we corroborate our theoretical results by comparing our algorithm to the EM algorithm and previous spectral algorithms.
Researcher Affiliation	Academia	Le Song EMAIL Georgia Institute of Technology, Atlanta, GA 30345 USA Animashree Anandkumar EMAIL University of California, Irvine, CA 92697, USA Bo Dai, Bo Xie BODAI,EMAIL Georgia Institute of Technology, Atlanta, GA 30345 USA
Pseudocode	Yes	The overall kernel algorithm is summarized in Algorithm 1. The tensor power method is provided in the Appendix in Algorithm 2 for completeness.
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We use the DLBCL Lymphoma dataset collection from (Aghaeepour et al., 2013) to compare our kernel algorithm with the four alternatives. This collection contains 24 datasets with two or three clusters, and each dataset consists of tens of thousands of cell measurements in 5 dimensions.
Dataset Splits	Yes	For each dataset, we select the best kernel bandwidth by 5-fold cross validation using log-likelihood.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or cloud instance specifications).
Software Dependencies	No	The paper discusses various algorithms (e.g., EM algorithm, k-means, spectral algorithms) but does not provide specific version numbers for any software libraries, programming languages, or solvers used in its implementation.
Experiment Setup	Yes	For each dataset, we select the best kernel bandwidth by 5-fold cross validation using log-likelihood. The mixture proportion for the h-th component is set to be h = 2h (k+1), 8h 2 [k] (unbalanced). The EM algorithm is not guaranteed to ﬁnd the global solution in each trial. Thus we randomly initialize it 10 times.