Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Abrupt Learning in Transformers: A Case Study on Matrix Completion
Authors: Pulkit Gopalani, Ekdeep S Lubana, Wei Hu
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We train a BERT model TFθ to predict missing entries in a low rank masked matrix X. In our training setup, the model converges to a final MSE of approximately 4e 3 that is, it can solve matrix completion well (as in Fig. 3, this MSE is lower than nuclear norm minimization). |
| Researcher Affiliation | Academia | Pulkit Gopalani University of Michigan EMAIL Ekdeep Singh Lubana Harvard University EMAIL Wei Hu University of Michigan EMAIL |
| Pseudocode | No | The paper describes methods in prose but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at this https://github.com/pulkitgopalani/tf-matcomp. |
| Open Datasets | No | data for matrix completion is generated as X = UV ; U, V Rn r, Uij, Vij iid Unif[ 1, 1] i, j [n] [r] so that X has rank at most r. To mask entries at random, we sample binary matrices M {0, 1}n n such that Mij = 0 with probability pmask, and 1 otherwise; that is, Ω= {(i, j) | Mij = 1}. |
| Dataset Splits | No | Since data is not partitioned into fixed train and test sets, we only analyze the training loss in all cases. |
| Hardware Specification | Yes | For 7 7 matrices (training and testing), we used a single {V100 / A100 / L40S} GPU. A single {A40 / A100 / L40S} GPU was used for matrices of order 10, 12, 15. |
| Software Dependencies | No | The paper mentions software like BERT and CVXPY, and the Huggingface's transformers library, but does not provide specific version numbers for these software dependencies (e.g., 'CVXPY 1.x' or 'transformers 4.x'). |
| Experiment Setup | Yes | We use a 4 layer, 8 head BERT model [40] for 7 7 (rank 2) matrices, with absolute positional embeddings, no token type embeddings, and no dropout. We fix pmask = 0.3 for training, and 256 matrices are sampled as training data at each step (in an online training setup). We use Adam optimizer with constant step size 1e 4 for 50000 steps, without weight decay or warmup. |