Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity

Authors: Jack William Miller, Charles O'Neill, Thang D Bui

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we conduct an empirical exploration of grokking, uncovering new aspects of the phenomenon not explained by current theory. We begin by describing grokking and summarise its existing explanations. Afterwards, we present our empirical observations most notably, the existence of grokking outside of neural networks. Finally, we suggest a mechanism for grokking that is broadly consistent with our observations.
Researcher Affiliation Academia Jack Miller EMAIL ANU College of Engineering, Computing and Cybernetics Charles O Neill EMAIL ANU College of Engineering, Computing and Cybernetics Thang Bui EMAIL ANU College of Engineering, Computing and Cybernetics
Pseudocode Yes The algorithm used to run the experiment is detailed in Algorithm 1 (Appendix I.1).
Open Source Code Yes All experiments can be found at this Git Hub page. They have descriptive names and should reproduce the figures seen in this paper. For Figure 6, the relevant experiment is in the feat/info-theory-description branch.
Open Datasets No Many datasets were used for the experimentation completed in this paper. They are were either found in Merrill et al. (2023), Power et al. (2022) or were developed independently.
Dataset Splits No The paper does not explicitly state the specific training/test/validation dataset splits (e.g., percentages or exact counts) used for the experiments. It refers to 'training points' and 'validation dataset' but without quantification of the split.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running its experiments.
Software Dependencies No The paper mentions optimizers like 'Adam optimiser' and 'SGD', but it does not specify any software libraries or frameworks with version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed to replicate the experiment.
Experiment Setup Yes For the model, we used a simple neural network analogous to that of Merrill et al. (2023). This neural network consisted of 1 hidden layer of size 1000 and was optimised using SGD with cross-entropy loss. The weight decay was set to 10^-2 and the learning rate to 10^-1. Loss plots for all experiments are shown in Appendix N.