Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Towards Understanding Inductive Bias in Transformers: A View From Infinity
Authors: Itay Lavie, Guy Gur-Ari, Zohar Ringel
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show experimentally the learnability bounds found based on the dimension of the relevant irreducible representations are tight. We analyze Wiki Text-2 and show evidence for an approximate permutation symmetry in its principal components, suggesting that the toolbox presented can be of use in natural language datasets. |
| Researcher Affiliation | Collaboration | 1Racah Institute of Physics, Hebrew University of Jerusalem, Jerusalem 91904, Israel 2Augment Computing. |
| Pseudocode | No | The paper does not include pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about open-sourcing its code or links to a code repository. |
| Open Datasets | Yes | We use a mixture of hidden Markov models (HMMs) (Baum & Petrie, 1966) as a dataset. Finally, we argue Wiki Text dataset, does indeed possess a degree of permutation symmetry. We analyze Wiki Text-2 and show evidence for an approximate permutation symmetry in its principal components, suggesting that the toolbox presented can be of use in natural language datasets. |
| Dataset Splits | No | The paper mentions training on a mixture of HMMs and testing on different distributions (train and test distributions for MSE loss), but it does not specify a separate validation dataset split or a cross-validation methodology. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., CPU/GPU models, memory, or cluster specifications). |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or frameworks). |
| Experiment Setup | Yes | The NN is trained on 8,000 samples drawn from the mixture p, q U(0, 0.4) with SGD with a mini-batch size of 50 and a learning rate of 10 3 for 10,000 epochs. The weights are initialized according to Le Cun initialization, meaning the weights in each layer are i.i.d with w N(0, 1 fan in), and the biases are initialized to zero. |