reproducibilityindex.ai

Towards Understanding Inductive Bias in Transformers: A View From Infinity

Authors: Itay Lavie, Guy Gur-Ari, Zohar Ringel

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show experimentally the learnability bounds found based on the dimension of the relevant irreducible representations are tight. We analyze Wiki Text-2 and show evidence for an approximate permutation symmetry in its principal components, suggesting that the toolbox presented can be of use in natural language datasets.
Researcher Affiliation	Collaboration	1Racah Institute of Physics, Hebrew University of Jerusalem, Jerusalem 91904, Israel 2Augment Computing.
Pseudocode	No	The paper does not include pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about open-sourcing its code or links to a code repository.
Open Datasets	Yes	We use a mixture of hidden Markov models (HMMs) (Baum & Petrie, 1966) as a dataset. Finally, we argue Wiki Text dataset, does indeed possess a degree of permutation symmetry. We analyze Wiki Text-2 and show evidence for an approximate permutation symmetry in its principal components, suggesting that the toolbox presented can be of use in natural language datasets.
Dataset Splits	No	The paper mentions training on a mixture of HMMs and testing on different distributions (train and test distributions for MSE loss), but it does not specify a separate validation dataset split or a cross-validation methodology.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., CPU/GPU models, memory, or cluster specifications).
Software Dependencies	No	The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or frameworks).
Experiment Setup	Yes	The NN is trained on 8,000 samples drawn from the mixture p, q U(0, 0.4) with SGD with a mini-batch size of 50 and a learning rate of 10 3 for 10,000 epochs. The weights are initialized according to Le Cun initialization, meaning the weights in each layer are i.i.d with w N(0, 1 fan in), and the biases are initialized to zero.