On Learning Mixture of Linear Regressions in the Non-Realizable Setting
Authors: Soumyabrata Pal, Arya Mazumdar, Rajat Sen, Avishek Ghosh
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we validate our theoretical findings via experiments. ... We implement and compare the performance of our algorithms on three non-linear datasets ... We compute the min-loss for five different algorithms on the train and test data (averaged over 30 implementations) for each pair of users and report them in Tables 6 and 3. |
| Researcher Affiliation | Collaboration | Avishek Ghosh 1 Arya Mazumdar 1 Soumyabrata Pal 2 Rajat Sen 3 1 Halıcıo glu Data Science Institute (HDSI), UC San Diego, USA 2 Google Research, India 3 Google Research, Palo Alto, USA. |
| Pseudocode | Yes | Algorithm 1 Gradient AM for Mixture of Regressions, Algorithm 2 Sub-sampled data driven learning of mixture of k regressions, Algorithm 3 Sub-sampled data driven learning of mixture of k regressions with random partitions. |
| Open Source Code | No | The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | For experiments on real datasets, we use the Movielens 1M dataset1 that consists of 1 million ratings from m = 6000 users on n = 4000 movies. 1https://grouplens.org/datasets/movielens/1m/ and Non-linear datasets: We implement and compare the performance of our algorithms on three non-linear datasets generated by sklearn namely makefriedman1 [A](fri, a), makefriedman2[B] (fri, b) and makefriedman3[C] (fri, c). Note that all these datasets are non-realizable. |
| Dataset Splits | Yes | We split this dataset into train and test (80 : 20); in Table 6 (in the appendix), we report the user ids and number of samples in train and test data. and All the three datasets A, B and C comprises of 3200 samples in the train data and 800 samples in the test data. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments (e.g., GPU/CPU models, memory specifications). |
| Software Dependencies | No | The paper mentions software components like 'sklearn.linearmodel.RANSACRegressor()' and 'sklearn.datasets.makeregression' and 'python' but does not specify their version numbers. |
| Experiment Setup | Yes | For dataset A, we implement Algorithm 1 with γ =0.1 and random initialization (every element of θ(0) 1 ,θ(0) 2 is generated i.i.d according to a Gaussian with mean 0 and standard deviation 10). and We implement Algorithm 3 with |A| = 150 and h = 1000 and in Steps 3,5 we use the Linear Regression model |