Stochastic PCA with $\ell_2$ and $\ell_1$ Regularization
Authors: Poorya Mianjy, Raman Arora
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide empirical results for our proposed algorithms ℓ2-RMSG, ℓ1-RMSG, and ℓ2,1-RMSG, compared to vanilla MSG, Oja s algorithm, and Follow The Leader (FTL) algorithm, on both synthetic and real datasets. The synthetic data is drawn from a d = 100 dimensional zero-mean multivariate Gaussian distribution with an exponential decay in the spectrum of the covariance matrix. The synthetic consists of n = 30K samples, out of which 20K samples are used for training and 5K each for tuning and testing. For comparisons on a real dataset, we choose MNIST which consists of n = 60K samples each of size d = 784. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Johns Hopkins University, Baltimore, USA. Correspondence to: Raman Arora <arora@cs.jhu.edu>. |
| Pseudocode | Yes | Algorithm 1 ℓ2-Regularized MSG (ℓ2-RMSG); Algorithm 2 ℓ1-Regularized MSG (ℓ1-RMSG); Algorithm 3 ℓ2 + ℓ1-Regularized MSG (ℓ2,1-RMSG) |
| Open Source Code | No | The paper does not provide any links to source code or explicitly state that the code is publicly available. |
| Open Datasets | Yes | For comparisons on a real dataset, we choose MNIST which consists of n = 60K samples each of size d = 784. |
| Dataset Splits | Yes | The synthetic consists of n = 30K samples, out of which 20K samples are used for training and 5K each for tuning and testing. |
| Hardware Specification | No | The paper states 'The runtime is captured in a controlled setting each run for every algorithm was on a dedicated identical compute node.' but does not provide specific hardware details like CPU or GPU models. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | For MSG and ℓ1-RMSG, the learning rate is set to η0/√t, and for ℓ2-RMSG, ℓ2,1-RMSG and Oja the learning rate was set to η0/t as suggested by theory. We choose η0 (initial learning rate), λ and µ by tuning2 each over the set {10−3, 10−2, 10−1, 1, 10, 102, 103} on held-out data, for k = 40. |