reproducibilityindex.ai

Neural Networks and the Chomsky Hierarchy

Authors: Gregoire Deletang, Anian Ruoss, Jordi Grau-Moya, Tim Genewein, Li Kevin Wenliang, Elliot Catt, Chris Cundy, Marcus Hutter, Shane Legg, Joel Veness, Pedro A Ortega

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we conduct an extensive empirical study (20 910 models, 15 tasks) to investigate whether insights from the theory of computation can predict the limits of neural network generalization in practice.
Researcher Affiliation	Collaboration	*Equal contribution. Correspondence to {gdelt, anianr}@deepmind.com. 1Deep Mind. 2Stanford University. Work performed while the author was at Deep Mind.
Pseudocode	Yes	Algorithm A.1: Training pipeline for our sequence prediction tasks. The comments (in blue) show an example output for the Reverse String (DCF) task.
Open Source Code	Yes	We provide an open-source implementation of our models, tasks, and training and evaluation suite at https://github.com/deepmind/neural_networks_chomsky_hierarchy.
Open Datasets	Yes	We provide an open-source implementation of our models, tasks, and training and evaluation suite at https://github.com/deepmind/neural_networks_chomsky_hierarchy. Instead of using fixed-size datasets, we define training and test distributions from which we continually sample sequences.
Dataset Splits	No	The paper specifies training and testing distributions for sequence lengths (e.g., 'training range N, with N = 40' and 'For testing, we sample the sequence length ℓfrom U(N + 1, M), with M = 500'), but it does not explicitly define a separate validation split or discuss its methodology.
Hardware Specification	Yes	We ran each task-architecture-hyperparameter triplet on a single TPU on our internal cluster.
Software Dependencies	No	The paper mentions using JAX (Bradbury et al., 2018) and the DeepMind JAX ecosystem (Babuschkin et al., 2020; Hessel et al., 2020; Hennigan et al., 2020) but does not provide specific version numbers for these software components.
Experiment Setup	Yes	We use the Adam optimizer (Kingma & Ba, 2015) with default hyperparameters for 1 000 000 steps... We run all experiments with 10 different random seeds (used for network parameter initialization) and three learning rates (1 10 4, 3 10 4 and 5 10 4), and we report the result obtained by the hyperparameters with the maximum score.