reproducibilityindex.ai

Alternating Minimization for Mixed Linear Regression

Authors: Xinyang Yi, Constantine Caramanis, Sujay Sanghavi

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present the empirical performance of our algorithm on synthetic data set. The results highlight in particular two important features of our results. First, the simulations corroborate our theoretical results given in Section 4, which show that our algorithm is nearly optimal (unimprovable) in terms of sample complexity. Indeed, we show here that EM+SVD succeeds when given about as many samples as dimensions (in the absence of additional structure, e.g., sparsity, it is not possible to do better). Second, our results show that the SVD initialization seems to be critical: without it, EM s performance is significantly degraded. Experiment Settings. Each input vector xi are generated independently from standard Guassian distribution with mean 0 and covariance matrix I). We then choose the mixture labels for each sample with equal probability, i.e., we set p1 = p2 = 0.5. Also, in each trial, we generate β 1 and β 2 randomly but keep β 1, β 2 = 1.73. This constant 1.73 is arbitrarily chosen here. In this case, β 1 and β 2 are non-orthogonal and it s impossible to recover them from the SVD step due to ambiguity. We run algorithm 2 with a fairly coarse grid: δ = 0.3. We also test algorithm 3 using p1 = p2. The following metric which stands for global optimality is used err(t) := max{ β(t) 1 β 1 2, β(t) 2 β 2 2}. Here t is the sequence of number of iterations.
Researcher Affiliation	Academia	Xinyang Yi YIXY@UTEXAS.EDU Constantine Caramanis CONSTANTINE@MAIL.UTEXAS.EDU Sujay Sanghavi SANGHAVI@MAIL.UTEXAS.EDU Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, 78712
Pseudocode	Yes	Algorithm 1 EM (noiseless case) Algorithm 2 Initialization Algorithm 3 Init with proportion information Algorithm 4 EM with resampling Algorithm 5 Initialization with resampling
Open Source Code	No	The paper does not contain any explicit statement about releasing code or a link to a code repository.
Open Datasets	No	In this section, we present the empirical performance of our algorithm on synthetic data set. Experiment Settings. Each input vector xi are generated independently from standard Guassian distribution with mean 0 and covariance matrix I). We then choose the mixture labels for each sample with equal probability, i.e., we set p1 = p2 = 0.5. Also, in each trial, we generate β 1 and β 2 randomly but keep β 1, β 2 = 1.73.
Dataset Splits	No	The paper discusses 'samples' and 'synthetic data set' but does not explicitly mention 'train', 'validation', or 'test' splits in terms of percentages or counts for reproducing the experiment. It refers to partitioning samples into disjoint sets for resampling, but this is not defined as standard train/validation/test splits.
Hardware Specification	No	The paper mentions 'the 10-35 error is precision of Matlab', which refers to software, but no specific hardware details (GPU models, CPU types, memory specifications) used for running the experiments are provided.
Software Dependencies	No	The paper mentions 'Matlab' in the context of numerical precision, but it does not specify a version number for Matlab or any other software dependencies crucial for reproducibility.
Experiment Setup	Yes	Experiment Settings. Each input vector xi are generated independently from standard Guassian distribution with mean 0 and covariance matrix I). We then choose the mixture labels for each sample with equal probability, i.e., we set p1 = p2 = 0.5. Also, in each trial, we generate β 1 and β 2 randomly but keep β 1, β 2 = 1.73. This constant 1.73 is arbitrarily chosen here. In this case, β 1 and β 2 are non-orthogonal and it s impossible to recover them from the SVD step due to ambiguity. We run algorithm 2 with a fairly coarse grid: δ = 0.3. We also test algorithm 3 using p1 = p2.