Measuring abstract reasoning in neural networks
Authors: David Barrett, Felix Hill, Adam Santoro, Ari Morcos, Timothy Lillicrap
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here, we propose a dataset and challenge designed to probe abstract reasoning, inspired by a well-known human IQ test. To succeed at this challenge, models must cope with various generalisation regimes in which the training and test data differ in clearlydefined ways. We show that popular models such as Res Nets perform poorly, even when the training and test sets differ only minimally, and we present a novel architecture, with a structure designed to encourage reasoning, that does significantly better. |
| Researcher Affiliation | Industry | David G.T. Barrett * 1 Felix Hill * 1 Adam Santoro * 1 Ari S. Morcos 1 Timothy Lillicrap 1 1DeepMind, London, United Kingdom. Correspondence to: <{barrettdavid; felixhill; adamsantoro}@google.com>. |
| Pseudocode | No | The paper describes the architecture of models like CNN-MLP, ResNet, LSTM, and WReN in text and uses diagrams (e.g., Figure 3 for WReN), but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We call our dataset the Procedurally Generated Matrices (PGM) dataset1. ... 1https://github.com/deepmind/abstract-reasoning-matrices |
| Open Datasets | Yes | We call our dataset the Procedurally Generated Matrices (PGM) dataset1. ... 1https://github.com/deepmind/abstract-reasoning-matrices |
| Dataset Splits | Yes | For each model, hyper-parameters were chosen using a grid sweep to select the model with smallest loss estimated on a held-out validation set. We used the validation loss for early-stopping and we report performance values on a held-out test set. |
| Hardware Specification | No | The paper does not mention any specific hardware details such as GPU models (e.g., NVIDIA A100), CPU models (e.g., Intel Core i7), or other detailed computer specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions algorithms and models like 'ADAM optimiser' (Kingma & Ba, 2014), 'Res Net-50 architecture' (He et al., 2016), and 'standard LSTM module' (Hochreiter & Schmidhuber, 1997), but it does not provide specific software names with version numbers (e.g., TensorFlow 2.x, PyTorch 1.x) that are needed to replicate the experiment. |
| Experiment Setup | Yes | For each model, hyper-parameters were chosen using a grid sweep to select the model with smallest loss estimated on a held-out validation set. We used the validation loss for early-stopping and we report performance values on a held-out test set. For hyper-parameter settings and further details on all models see appendix A. |