Feature Collapse
Authors: Thomas Laurent, James von Brecht, Xavier Bresson
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We start by showing experimentally that feature collapse goes hand in hand with generalization. We then prove that, in the large sample limit, distinct tokens that play identical roles in the task receive identical local feature in the first layer of the network. This analysis shows that a neural network trained on this task provably learns interpretable and meaningful representations in its first layer. Finally, we conduct experiments that show feature collapse and generalization go hand in hand. |
| Researcher Affiliation | Academia | Thomas Laurent1, James H. von Brecht, Xavier Bresson2 1 Loyola Marymount University, tlaurent@lmu.edu 2 National University of Singapore, xaviercs@nus.edu.sg |
| Pseudocode | No | The paper describes methods using prose and mathematical equations but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The codes for our experiments are available at https://github.com/xbresson/feature_collapse. |
| Open Datasets | No | The paper uses a synthetic data model generated by the authors, describing the process for constructing training sets (e.g., 'We then construct a training set by generating nspl = 5 data points from each latent variable.'). It does not provide access information for a publicly available or open dataset. |
| Dataset Splits | No | The paper mentions 'training set' and 'test points' but does not specify any validation dataset or provide details on how data was split for training, validation, and testing. |
| Hardware Specification | No | The paper describes the training of neural networks but does not specify any hardware details like GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper mentions that 'The codes for our experiments are available at https://github.com/xbresson/feature_collapse,' implying software dependencies would be within the repository, but it does not list specific software components with version numbers in the text. |
| Experiment Setup | Yes | For the parameters of the architecture, loss, and training procedure, we use an embedding dimension of d = 100, a weight decay of λ = 0.001, a mini-batch size of 100 and a constant learning rate 0.1, respectively, for all experiments. |