Measuring abstract reasoning in neural networks

Authors: David Barrett, Felix Hill, Adam Santoro, Ari Morcos, Timothy Lillicrap

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here, we propose a dataset and challenge designed to probe abstract reasoning, inspired by a well-known human IQ test. To succeed at this challenge, models must cope with various generalisation regimes in which the training and test data differ in clearlydefined ways. We show that popular models such as Res Nets perform poorly, even when the training and test sets differ only minimally, and we present a novel architecture, with a structure designed to encourage reasoning, that does significantly better.
Researcher Affiliation Industry David G.T. Barrett * 1 Felix Hill * 1 Adam Santoro * 1 Ari S. Morcos 1 Timothy Lillicrap 1 1DeepMind, London, United Kingdom. Correspondence to: <{barrettdavid; felixhill; adamsantoro}@google.com>.
Pseudocode No The paper describes the architecture of models like CNN-MLP, ResNet, LSTM, and WReN in text and uses diagrams (e.g., Figure 3 for WReN), but it does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes We call our dataset the Procedurally Generated Matrices (PGM) dataset1. ... 1https://github.com/deepmind/abstract-reasoning-matrices
Open Datasets Yes We call our dataset the Procedurally Generated Matrices (PGM) dataset1. ... 1https://github.com/deepmind/abstract-reasoning-matrices
Dataset Splits Yes For each model, hyper-parameters were chosen using a grid sweep to select the model with smallest loss estimated on a held-out validation set. We used the validation loss for early-stopping and we report performance values on a held-out test set.
Hardware Specification No The paper does not mention any specific hardware details such as GPU models (e.g., NVIDIA A100), CPU models (e.g., Intel Core i7), or other detailed computer specifications used for running the experiments.
Software Dependencies No The paper mentions algorithms and models like 'ADAM optimiser' (Kingma & Ba, 2014), 'Res Net-50 architecture' (He et al., 2016), and 'standard LSTM module' (Hochreiter & Schmidhuber, 1997), but it does not provide specific software names with version numbers (e.g., TensorFlow 2.x, PyTorch 1.x) that are needed to replicate the experiment.
Experiment Setup Yes For each model, hyper-parameters were chosen using a grid sweep to select the model with smallest loss estimated on a held-out validation set. We used the validation loss for early-stopping and we report performance values on a held-out test set. For hyper-parameter settings and further details on all models see appendix A.