Stochastic PCA with $\ell_2$ and $\ell_1$ Regularization

Authors: Poorya Mianjy, Raman Arora

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide empirical results for our proposed algorithms ℓ2-RMSG, ℓ1-RMSG, and ℓ2,1-RMSG, compared to vanilla MSG, Oja s algorithm, and Follow The Leader (FTL) algorithm, on both synthetic and real datasets. The synthetic data is drawn from a d = 100 dimensional zero-mean multivariate Gaussian distribution with an exponential decay in the spectrum of the covariance matrix. The synthetic consists of n = 30K samples, out of which 20K samples are used for training and 5K each for tuning and testing. For comparisons on a real dataset, we choose MNIST which consists of n = 60K samples each of size d = 784.
Researcher Affiliation Academia 1Department of Computer Science, Johns Hopkins University, Baltimore, USA. Correspondence to: Raman Arora <arora@cs.jhu.edu>.
Pseudocode Yes Algorithm 1 ℓ2-Regularized MSG (ℓ2-RMSG); Algorithm 2 ℓ1-Regularized MSG (ℓ1-RMSG); Algorithm 3 ℓ2 + ℓ1-Regularized MSG (ℓ2,1-RMSG)
Open Source Code No The paper does not provide any links to source code or explicitly state that the code is publicly available.
Open Datasets Yes For comparisons on a real dataset, we choose MNIST which consists of n = 60K samples each of size d = 784.
Dataset Splits Yes The synthetic consists of n = 30K samples, out of which 20K samples are used for training and 5K each for tuning and testing.
Hardware Specification No The paper states 'The runtime is captured in a controlled setting each run for every algorithm was on a dedicated identical compute node.' but does not provide specific hardware details like CPU or GPU models.
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup Yes For MSG and ℓ1-RMSG, the learning rate is set to η0/√t, and for ℓ2-RMSG, ℓ2,1-RMSG and Oja the learning rate was set to η0/t as suggested by theory. We choose η0 (initial learning rate), λ and µ by tuning2 each over the set {10−3, 10−2, 10−1, 1, 10, 102, 103} on held-out data, for k = 40.