Model Collapse Demystified: The Case of Regression
Authors: Elvis Dohmatob, Yunzhen Feng, Julia Kempe
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theoretical results are validated with experiments. |
| Researcher Affiliation | Collaboration | FAIR, Meta Center for Data Science, New York University Courant Institue of Mathematical Sciences, New York University |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | We only use one publicly available dataset, MNIST, and no idiosyncratic model. Thus, we provide neither dataset nor code, as the dataset is publicly available, and the experiments are easy to reproduce from their description. |
| Open Datasets | Yes | We conduct experiments using kernel ridge regression on the MNIST dataset [16] |
| Dataset Splits | No | The classification dataset contains 60, 000 training and 10, 000 test data points (handwritten), with labels from 0 to 9 inclusive. |
| Hardware Specification | No | No specific hardware details (GPU/CPU models, memory) were mentioned for the experimental setup. The acknowledgments only vaguely refer to 'NYU IT High Performance Computing (HPC) resources, services, and staff expertise'. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python, PyTorch, scikit-learn versions) were mentioned in the paper. |
| Experiment Setup | Yes | Specifically, the models were trained using stochastic gradient descent (SGD) with a batch size of 128 and a learning rate of 0.1. We employed a regression setting where labels were converted to one-hot vectors, and the model was trained using mean squared error for 200 epochs to convergence. When generating the synthetic data, Gaussian label noise with a standard deviation of 0.1 is added. |