A Tale of Tails: Model Collapse as a Change of Scaling Laws

Authors: Elvis Dohmatob, Yunzhen Feng, Pu Yang, Francois Charton, Julia Kempe

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our theory is validated by large-scale experiments with a transformer on an arithmetic task and text generation using the large language model Llama2.
Researcher Affiliation Collaboration 1Meta FAIR 2Center for Data Science, New York University 3School of Mathematical Sciences, Peking University 4Courant Institute, New York University.
Pseudocode No The paper describes various models and algorithms but does not provide any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not state that its own code is open-source or provide a link to a repository for the described methodology.
Open Datasets Yes We empirically verify these theoretical predictions (see Figure 4): (1) in large-scale experiments on an LLM, fine-tuning Llama2-7B (Touvron et al., 2023) on an approximately 2M sample dataset from Wikitext-103
Dataset Splits No The paper mentions training on a dataset and evaluating on a test set, but does not specify explicit train/validation/test dataset splits or their percentages/counts.
Hardware Specification No This work was supported in part through the NYU IT High Performance Computing resources, services, and staff expertise. This is too general and lacks specific hardware details like GPU or CPU models.
Software Dependencies No The paper mentions software like Llama2-7B, LoRA, and Adam optimizer, but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes Throughout the finetuning process, we maintain consistent settings using learning rate 5e 5 for Lo RA, using Adam optimizer, dropout rate 0.1, trainable parameter fraction 0.062%.