Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Condition for Perfect Dimensionality Recovery by Variational Bayesian PCA

Authors: Shinichi Nakajima, Ryota Tomioka, Masashi Sugiyama, S. Derin Babacan

JMLR 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our result theoretically guarantees the performance of VB-PCA. At the same time, it also reveals the conservative nature of VB learning it oﬀers a low false positive rate at the expense of low sensitivity...Figure 6 shows numerical simulation results for M = 200 and L = 20, 100, 200...Figure 7 shows numerical simulation results that compare EVB and OL
Researcher Affiliation	Collaboration	Shinichi Nakajima EMAIL Berlin Big Data Center Technische Universit at Berlin Berlin 10587 Germany Ryota Tomioka EMAIL Toyota Technological Institute at Chicago Chicago, IL 60637 USA Masashi Sugiyama EMAIL Department of Complexity Science and Engineering The University of Tokyo Tokyo 113-0033 Japan S. Derin Babacan EMAIL Google Inc. Mountain View, CA 94043 USA
Pseudocode	Yes	Algorithm 1 Global EVB-PCA algorithm. 1: Transpose V V if L > M. 2: Refer to the table of τ(α) at α = L/M (or use a simple approximation τ 2.5129 α). 3: Set H ( L) to a suﬃciently large value, and compute the SVD of V = PH h=1 γhωbhω ah. 4: Locally search the minimizer bσ2 EVB of Eq.(40), which lies in the range (44). 5: Discard the components such that σ2 h < bσ2 EVB, where σ2 h is deﬁned by Eq.(45).
Open Source Code	No	The MATLAB R code will be available at http://sites.google.com/site/shinnkj23/.
Open Datasets	No	We assume that the observed matrix V is generated from the spiked covariance model (Johnstone, 2001): V = U + E, where U RL M is a true signal matrix with rank H and singular values {γ h}H h=1, and E RL M is a random matrix such that each element is independently drawn from a distribution with mean zero and variance σ 2 (not necessarily Gaussian)... E was drawn from the independent Gaussian distribution with variance σ 2 = 1, and true signal singular values {γ h}H h=1 were drawn from the uniform distribution on [z Mσ ] for diﬀerent z
Dataset Splits	No	The paper uses synthetic data generated according to specific distributions and parameters for its numerical simulations, rather than a pre-existing public dataset with standard splits. Therefore, information about training/test/validation splits is not applicable or provided.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud resources) used for running the numerical simulations.
Software Dependencies	No	The paper mentions that MATLAB code will be available, implying MATLAB is used. However, it does not specify a version number for MATLAB or any other software libraries or dependencies with their respective versions, which is required for reproducible software dependency information.
Experiment Setup	No	The paper describes theoretical derivations and numerical simulations using generated data with specific parameters (e.g., M=200, L values, distributions for generating E and singular values). However, it does not provide details on typical experimental setup elements like hyperparameters (learning rate, batch size, epochs), optimizer settings, or model initialization as it primarily focuses on an analytic solution rather than a model that is trained iteratively.