Alternating Minimization for Mixed Linear Regression
Authors: Xinyang Yi, Constantine Caramanis, Sujay Sanghavi
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present the empirical performance of our algorithm on synthetic data set. The results highlight in particular two important features of our results. First, the simulations corroborate our theoretical results given in Section 4, which show that our algorithm is nearly optimal (unimprovable) in terms of sample complexity. Indeed, we show here that EM+SVD succeeds when given about as many samples as dimensions (in the absence of additional structure, e.g., sparsity, it is not possible to do better). Second, our results show that the SVD initialization seems to be critical: without it, EM s performance is significantly degraded. Experiment Settings. Each input vector xi are generated independently from standard Guassian distribution with mean 0 and covariance matrix I). We then choose the mixture labels for each sample with equal probability, i.e., we set p1 = p2 = 0.5. Also, in each trial, we generate β 1 and β 2 randomly but keep β 1, β 2 = 1.73. This constant 1.73 is arbitrarily chosen here. In this case, β 1 and β 2 are non-orthogonal and it s impossible to recover them from the SVD step due to ambiguity. We run algorithm 2 with a fairly coarse grid: δ = 0.3. We also test algorithm 3 using p1 = p2. The following metric which stands for global optimality is used err(t) := max{ β(t) 1 β 1 2, β(t) 2 β 2 2}. Here t is the sequence of number of iterations. |
| Researcher Affiliation | Academia | Xinyang Yi YIXY@UTEXAS.EDU Constantine Caramanis CONSTANTINE@MAIL.UTEXAS.EDU Sujay Sanghavi SANGHAVI@MAIL.UTEXAS.EDU Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, TX, 78712 |
| Pseudocode | Yes | Algorithm 1 EM (noiseless case) Algorithm 2 Initialization Algorithm 3 Init with proportion information Algorithm 4 EM with resampling Algorithm 5 Initialization with resampling |
| Open Source Code | No | The paper does not contain any explicit statement about releasing code or a link to a code repository. |
| Open Datasets | No | In this section, we present the empirical performance of our algorithm on synthetic data set. Experiment Settings. Each input vector xi are generated independently from standard Guassian distribution with mean 0 and covariance matrix I). We then choose the mixture labels for each sample with equal probability, i.e., we set p1 = p2 = 0.5. Also, in each trial, we generate β 1 and β 2 randomly but keep β 1, β 2 = 1.73. |
| Dataset Splits | No | The paper discusses 'samples' and 'synthetic data set' but does not explicitly mention 'train', 'validation', or 'test' splits in terms of percentages or counts for reproducing the experiment. It refers to partitioning samples into disjoint sets for resampling, but this is not defined as standard train/validation/test splits. |
| Hardware Specification | No | The paper mentions 'the 10-35 error is precision of Matlab', which refers to software, but no specific hardware details (GPU models, CPU types, memory specifications) used for running the experiments are provided. |
| Software Dependencies | No | The paper mentions 'Matlab' in the context of numerical precision, but it does not specify a version number for Matlab or any other software dependencies crucial for reproducibility. |
| Experiment Setup | Yes | Experiment Settings. Each input vector xi are generated independently from standard Guassian distribution with mean 0 and covariance matrix I). We then choose the mixture labels for each sample with equal probability, i.e., we set p1 = p2 = 0.5. Also, in each trial, we generate β 1 and β 2 randomly but keep β 1, β 2 = 1.73. This constant 1.73 is arbitrarily chosen here. In this case, β 1 and β 2 are non-orthogonal and it s impossible to recover them from the SVD step due to ambiguity. We run algorithm 2 with a fairly coarse grid: δ = 0.3. We also test algorithm 3 using p1 = p2. |