Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Condition for Perfect Dimensionality Recovery by Variational Bayesian PCA
Authors: Shinichi Nakajima, Ryota Tomioka, Masashi Sugiyama, S. Derin Babacan
JMLR 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our result theoretically guarantees the performance of VB-PCA. At the same time, it also reveals the conservative nature of VB learning it offers a low false positive rate at the expense of low sensitivity...Figure 6 shows numerical simulation results for M = 200 and L = 20, 100, 200...Figure 7 shows numerical simulation results that compare EVB and OL |
| Researcher Affiliation | Collaboration | Shinichi Nakajima EMAIL Berlin Big Data Center Technische Universit at Berlin Berlin 10587 Germany Ryota Tomioka EMAIL Toyota Technological Institute at Chicago Chicago, IL 60637 USA Masashi Sugiyama EMAIL Department of Complexity Science and Engineering The University of Tokyo Tokyo 113-0033 Japan S. Derin Babacan EMAIL Google Inc. Mountain View, CA 94043 USA |
| Pseudocode | Yes | Algorithm 1 Global EVB-PCA algorithm. 1: Transpose V V if L > M. 2: Refer to the table of τ(α) at α = L/M (or use a simple approximation τ 2.5129 α). 3: Set H ( L) to a sufficiently large value, and compute the SVD of V = PH h=1 γhωbhω ah. 4: Locally search the minimizer bσ2 EVB of Eq.(40), which lies in the range (44). 5: Discard the components such that σ2 h < bσ2 EVB, where σ2 h is defined by Eq.(45). |
| Open Source Code | No | The MATLAB R code will be available at http://sites.google.com/site/shinnkj23/. |
| Open Datasets | No | We assume that the observed matrix V is generated from the spiked covariance model (Johnstone, 2001): V = U + E, where U RL M is a true signal matrix with rank H and singular values {γ h}H h=1, and E RL M is a random matrix such that each element is independently drawn from a distribution with mean zero and variance σ 2 (not necessarily Gaussian)... E was drawn from the independent Gaussian distribution with variance σ 2 = 1, and true signal singular values {γ h}H h=1 were drawn from the uniform distribution on [z Mσ ] for different z |
| Dataset Splits | No | The paper uses synthetic data generated according to specific distributions and parameters for its numerical simulations, rather than a pre-existing public dataset with standard splits. Therefore, information about training/test/validation splits is not applicable or provided. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cloud resources) used for running the numerical simulations. |
| Software Dependencies | No | The paper mentions that MATLAB code will be available, implying MATLAB is used. However, it does not specify a version number for MATLAB or any other software libraries or dependencies with their respective versions, which is required for reproducible software dependency information. |
| Experiment Setup | No | The paper describes theoretical derivations and numerical simulations using generated data with specific parameters (e.g., M=200, L values, distributions for generating E and singular values). However, it does not provide details on typical experimental setup elements like hyperparameters (learning rate, batch size, epochs), optimizer settings, or model initialization as it primarily focuses on an analytic solution rather than a model that is trained iteratively. |