Learning Universal Predictors
Authors: Jordi Grau-Moya, Tim Genewein, Marcus Hutter, Laurent Orseau, Gregoire Deletang, Elliot Catt, Anian Ruoss, Li Kevin Wenliang, Christopher Mattern, Matthew Aitchison, Joel Veness
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comprehensive experiments with neural architectures (e.g. LSTMs, Transformers) and algorithmic data generators of varying complexity and universality. Our results suggest that UTM data is a valuable resource for metalearning, and that it can be used to train neural networks capable of learning universal prediction strategies. |
| Researcher Affiliation | Industry | 1Google Deep Mind, London, UK. |
| Pseudocode | Yes | Algorithm 1 Returns the number of repetitions in sequence for a given delay between symbols. def repeating_count(output, delay): count = 0 # number of equal elements for i in range(delay + 1, len(output)): if output[i] == output[i-delay]: count += 1 return count |
| Open Source Code | Yes | 4) We open-sourced all our generators at https://github.com/google-deepmind/ neural_networks_solomonoff_induction. |
| Open Datasets | No | The paper describes generating data from Universal Turing Machines (UTMs), Variable-order Markov Sources (VOMS), and Chomsky Hierarchy (CH) Tasks. While these are described, there are no specific links, DOIs, or citations to publicly available *datasets* for direct download in the way standard datasets are typically provided. The data is generated by their own systems/methods. |
| Dataset Splits | No | The paper mentions 'batch size 128, sequence length 256' and evaluation 'on 6k sequences of length 256, which we refer as in-distribution... and of length 1024, referred as out-of-distribution'. It does not explicitly state train/validation/test splits in terms of percentages or counts, or how validation was handled specifically during training beyond monitoring loss. |
| Hardware Specification | No | The paper does not provide specific hardware specifications like GPU models (e.g., NVIDIA A100), CPU models, or cloud computing instance types used for running experiments. It only mentions 'memory-based meta-learning' and 'neural architectures (e.g. LSTMs, Transformers)' which are software concepts. |
| Software Dependencies | No | The paper mentions using 'ADAM optimizer (Kingma & Ba, 2014)' and 'LSTMs (Hochreiter & Schmidhuber, 1997), and Transformers (Vaswani et al., 2017)' but does not specify software versions for any libraries, frameworks (like PyTorch or TensorFlow), or Python versions. |
| Experiment Setup | Yes | We train for 500K iterations with batch size 128, sequence length 256, and learning rate 10^-4. |