Rich Component Analysis

Authors: Rong Ge, James Zou

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show how to integrate RCA with stochastic gradient descent into a meta-algorithm for learning general models, and demonstrate substantial improvement in accuracy on several synthetic and real datasets in both supervised and unsupervised tasks. 5. Experiments In the experiments, we focus on the contrastive learning setting where we are given observations of U = S1 + S2 and V = AS2 + S3. The goal is to estimate the parameters for the S1 distribution. Our approach can also learn the shared component S2 as well as S3. We tested our method in five settings, where S1 corresponds to: low rank Gaussian (PCA), linear regression, mixture of Gaussians (GMM), logistic regression and the Ising model.
Researcher Affiliation Collaboration Rong Ge RONGGE@CS.DUKE.EDU Duke University, Computer Science Department, 308 Research Dr, Durham NC 27708 James Zou JAMESYZOU@GMAIL.COM Microsoft Research, One Memorial Dr, Cambridge MA 01239
Pseudocode Yes Algorithm 1 Find Linear
Open Source Code No The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We applied RCA to a real dataset of DNA methylation biomarkers. Twenty biomarkers (10 test and 10 control) measured the DNA methylation level (a real number between 0 and 1) at twenty genomic loci across 686 individuals (Zou et al., 2014).
Dataset Splits No The paper describes the datasets used and some experimental settings (e.g., '10 dimensional logistic model', '5-by-5 Ising model') but does not specify explicit training, validation, or test dataset splits (e.g., percentages or exact counts) for reproduction.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory specifications, or cloud computing instance types) used for running the experiments.
Software Dependencies No The paper mentions general algorithms and methods like 'stochastic gradient descent' and 'EM' but does not specify any software dependencies or libraries with version numbers required for reproduction.
Experiment Setup Yes In all five settings, we let S3 be sampled uniformly from [−1, 1]d, where d is the dimension of S3. ... S1 was set to have a principal component along direction v1, i.e. s1 ∼ N(0, v1v1T + σ2I). S2 was sampled from Unif([−1, 1]d)+v2v2T... S1 is a mixture of d spherical Gaussians in Rd... We use the 4-th order Chebychev polynomial approximation to the SGD of logistic regression as in Section 4.2.