A Tale of Tails: Model Collapse as a Change of Scaling Laws
Authors: Elvis Dohmatob, Yunzhen Feng, Pu Yang, Francois Charton, Julia Kempe
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theory is validated by large-scale experiments with a transformer on an arithmetic task and text generation using the large language model Llama2. |
| Researcher Affiliation | Collaboration | 1Meta FAIR 2Center for Data Science, New York University 3School of Mathematical Sciences, Peking University 4Courant Institute, New York University. |
| Pseudocode | No | The paper describes various models and algorithms but does not provide any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not state that its own code is open-source or provide a link to a repository for the described methodology. |
| Open Datasets | Yes | We empirically verify these theoretical predictions (see Figure 4): (1) in large-scale experiments on an LLM, fine-tuning Llama2-7B (Touvron et al., 2023) on an approximately 2M sample dataset from Wikitext-103 |
| Dataset Splits | No | The paper mentions training on a dataset and evaluating on a test set, but does not specify explicit train/validation/test dataset splits or their percentages/counts. |
| Hardware Specification | No | This work was supported in part through the NYU IT High Performance Computing resources, services, and staff expertise. This is too general and lacks specific hardware details like GPU or CPU models. |
| Software Dependencies | No | The paper mentions software like Llama2-7B, LoRA, and Adam optimizer, but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Throughout the finetuning process, we maintain consistent settings using learning rate 5e 5 for Lo RA, using Adam optimizer, dropout rate 0.1, trainable parameter fraction 0.062%. |