reproducibilityindex.ai

Linguistic Collapse: Neural Collapse in (Large) Language Models

Authors: Robert Wu, Vardan Papyan

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper empirically investigates the impact of scaling the architectures and training of causal language models (CLMs) on their progression towards NC.
Researcher Affiliation	Academia	Robert Wu University of Toronto, Vector Institute rupert@cs.toronto.edu Vardan Papyan University of Toronto, Vector Institute vardan.papyan@utoronto.ca
Pseudocode	No	No pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	Our code is hosted on Git Hub: https://github.com/rhubarbwu/linguistic-collapse. Codes for (post-)training analysis are hosted on Git Hub: Main code https://github.com/rhubarbwu/linguistic-collapse Auxillary package: https://github.com/rhubarbwu/neural-collapse
Open Datasets	Yes	Tiny Stories [2] is a synthetic4 dataset generated by GPT-3.5 and GPT-4 using around 1500 English words a child might use.
Dataset Splits	Yes	The 2,141,709 stories are split into 2,119,719 train and 21,990 validation stories.
Hardware Specification	Yes	Each model was trained on a single NVIDIA A100 (40GB) GPU for up to 8 hours per epoch.
Software Dependencies	Yes	transformers_version 4.28.1
Experiment Setup	Yes	We use 30 CLM architectures based on GPT Neo [80], configured similarly to [2]. They vary in width (embedding dimension) d {64, 128, 256, 512, 768, 1024} and depth (number of self-attention layers) L {1, 2, 4, 8, 12}. Our models were trained by teacher-forcing7 using CE loss. For each architecture, we trained multiple models for 1, 3, and 10 epochs ablated over weight decay factors β = 0.0005 [51] and β = 0.1 [81].