A Comprehensively Tight Analysis of Gradient Descent for PCA

Authors: Zhiqiang Xu, Ping Li

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments are conducted to confirm our findings as well. The purpose of the experimental study for corroborating our findings in above sections is twofold.
Researcher Affiliation Industry Zhiqiang Xu, Ping Li Cognitive Computing Lab Baidu Research No. 10 Xibeiwang East Road, Beijing 100193, China 10900 NE 8th St. Bellevue, Washington 98004, USA {xuzhiqiang04,liping11}@baidu.com
Pseudocode Yes Algorithm 1 VR-PCA (Oja). Algorithm 2 VR-PCA (Krasulina).
Open Source Code No The paper states "We implemented the PGD with η= 1ρ, ηt = 0.6 x t Axt , 1 x t Axt , 1.6 x t Axt , and RGD with step-size schemes η = 1 λ1 λn , ηt = 0.6 x t Axt , 1 x t Axt , 1.6 x t Axt , in MATLAB." but does not provide any public link or explicit statement about releasing the code for the described methodology.
Open Datasets Yes We also experiment on two real datasets4 Schenk that was used in [19, 5] and GHS_indef. (footnote 4: https://sparse.tamu.edu/) The common PCA datasets are used and summarized in Table 4. (Table 4 lists MMILL, JW11, MNIST).
Dataset Splits No The paper does not provide specific training/validation/test dataset splits. It mentions "train", "validation", "test" as schema fields, but the text of the paper itself does not contain explicit validation split information like percentages or counts.
Hardware Specification Yes Experiments were done on a laptop (dual-core 2.30GHZ CPU and 8GB RAM).
Software Dependencies No The paper states "We implemented the PGD... in MATLAB." but does not provide a specific version number for MATLAB or any other software dependencies.
Experiment Setup Yes We implemented the PGD with η= 1ρ, ηt = 0.6 x t Axt , 1 x t Axt , 1.6 x t Axt , and RGD with step-size schemes η = 1 λ1 λn , ηt = 0.6 x t Axt , 1 x t Axt , 1.6 x t Axt , in MATLAB. All the methods start from the same random initial point x0 and run for T = 100 iterations. We use b = 100. Note that we only update the learning rate at the epoch level and keep it unchanged within each epoch, similar to the case of the computation of the full gradient.