reproducibilityindex.ai

A Stochastic PCA and SVD Algorithm with an Exponential Convergence Rate

Authors: Ohad Shamir

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	3. Experiments We now turn to present some experiments, which demonstrate the performance of the VR-PCA algorithm.
Researcher Affiliation	Academia	Ohad Shamir OHAD.SHAMIR@WEIZMANN.AC.IL Weizmann Institute of Science, Rehovot, Israel
Pseudocode	Yes	Algorithm 1 VR-PCA
Open Source Code	No	The paper does not contain any statement about making the source code available or provide a link to a code repository.
Open Datasets	Yes	Next, we performed a similar experiment using the training data of the well-known MNIST and CCAT datasets. The MNIST data matrix size is 784 70000, and was preprocessed by centering the data and dividing each coordinate by its standard deviation times the squared root of the dimension. The CCAT data matrix is sparse (only 0.16% of entries are non-zero), of size 23149 781265, and was used as-is.
Dataset Splits	No	The paper mentions using 'training data' for MNIST and CCAT datasets, but it does not specify any training/validation/test splits (e.g., percentages, sample counts, or specific predefined splits) for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., 'Python 3.8', 'PyTorch 1.9') used in the experiments.
Experiment Setup	Yes	Rather than tuning its parameters, we used the following ﬁxed heuristic: The epoch length m was set to n (number of data points, or columns in the data matrix), and η was set to η = 1 r n, where r = 1 n Pn i=1 xi 2 is the average squared norm of the data. The choice of m = n ensures that at each epoch, the runtime is about equally divided between the stochastic updates and the computation of u. The choice of η is motivated by our theoretical analysis, which requires η on the order of 1/(maxi xi 2 n) in the regime where m should be on the order of n. All algorithms were initialized from the same random vector, chosen uniformly at random from the unit ball.