Learning Universal Predictors

Authors: Jordi Grau-Moya, Tim Genewein, Marcus Hutter, Laurent Orseau, Gregoire Deletang, Elliot Catt, Anian Ruoss, Li Kevin Wenliang, Christopher Mattern, Matthew Aitchison, Joel Veness

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct comprehensive experiments with neural architectures (e.g. LSTMs, Transformers) and algorithmic data generators of varying complexity and universality. Our results suggest that UTM data is a valuable resource for metalearning, and that it can be used to train neural networks capable of learning universal prediction strategies.
Researcher Affiliation Industry 1Google Deep Mind, London, UK.
Pseudocode Yes Algorithm 1 Returns the number of repetitions in sequence for a given delay between symbols. def repeating_count(output, delay): count = 0 # number of equal elements for i in range(delay + 1, len(output)): if output[i] == output[i-delay]: count += 1 return count
Open Source Code Yes 4) We open-sourced all our generators at https://github.com/google-deepmind/ neural_networks_solomonoff_induction.
Open Datasets No The paper describes generating data from Universal Turing Machines (UTMs), Variable-order Markov Sources (VOMS), and Chomsky Hierarchy (CH) Tasks. While these are described, there are no specific links, DOIs, or citations to publicly available *datasets* for direct download in the way standard datasets are typically provided. The data is generated by their own systems/methods.
Dataset Splits No The paper mentions 'batch size 128, sequence length 256' and evaluation 'on 6k sequences of length 256, which we refer as in-distribution... and of length 1024, referred as out-of-distribution'. It does not explicitly state train/validation/test splits in terms of percentages or counts, or how validation was handled specifically during training beyond monitoring loss.
Hardware Specification No The paper does not provide specific hardware specifications like GPU models (e.g., NVIDIA A100), CPU models, or cloud computing instance types used for running experiments. It only mentions 'memory-based meta-learning' and 'neural architectures (e.g. LSTMs, Transformers)' which are software concepts.
Software Dependencies No The paper mentions using 'ADAM optimizer (Kingma & Ba, 2014)' and 'LSTMs (Hochreiter & Schmidhuber, 1997), and Transformers (Vaswani et al., 2017)' but does not specify software versions for any libraries, frameworks (like PyTorch or TensorFlow), or Python versions.
Experiment Setup Yes We train for 500K iterations with batch size 128, sequence length 256, and learning rate 10^-4.