reproducibilityindex.ai

Stochastic PCA with $\ell_2$ and $\ell_1$ Regularization

Authors: Poorya Mianjy, Raman Arora

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide empirical results for our proposed algorithms ℓ2-RMSG, ℓ1-RMSG, and ℓ2,1-RMSG, compared to vanilla MSG, Oja s algorithm, and Follow The Leader (FTL) algorithm, on both synthetic and real datasets. The synthetic data is drawn from a d = 100 dimensional zero-mean multivariate Gaussian distribution with an exponential decay in the spectrum of the covariance matrix. The synthetic consists of n = 30K samples, out of which 20K samples are used for training and 5K each for tuning and testing. For comparisons on a real dataset, we choose MNIST which consists of n = 60K samples each of size d = 784.
Researcher Affiliation	Academia	1Department of Computer Science, Johns Hopkins University, Baltimore, USA. Correspondence to: Raman Arora <arora@cs.jhu.edu>.
Pseudocode	Yes	Algorithm 1 ℓ2-Regularized MSG (ℓ2-RMSG); Algorithm 2 ℓ1-Regularized MSG (ℓ1-RMSG); Algorithm 3 ℓ2 + ℓ1-Regularized MSG (ℓ2,1-RMSG)
Open Source Code	No	The paper does not provide any links to source code or explicitly state that the code is publicly available.
Open Datasets	Yes	For comparisons on a real dataset, we choose MNIST which consists of n = 60K samples each of size d = 784.
Dataset Splits	Yes	The synthetic consists of n = 30K samples, out of which 20K samples are used for training and 5K each for tuning and testing.
Hardware Specification	No	The paper states 'The runtime is captured in a controlled setting each run for every algorithm was on a dedicated identical compute node.' but does not provide specific hardware details like CPU or GPU models.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers.
Experiment Setup	Yes	For MSG and ℓ1-RMSG, the learning rate is set to η0/√t, and for ℓ2-RMSG, ℓ2,1-RMSG and Oja the learning rate was set to η0/t as suggested by theory. We choose η0 (initial learning rate), λ and µ by tuning2 each over the set {10−3, 10−2, 10−1, 1, 10, 102, 103} on held-out data, for k = 40.