reproducibilityindex.ai

Measuring abstract reasoning in neural networks

Authors: David Barrett, Felix Hill, Adam Santoro, Ari Morcos, Timothy Lillicrap

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here, we propose a dataset and challenge designed to probe abstract reasoning, inspired by a well-known human IQ test. To succeed at this challenge, models must cope with various generalisation regimes in which the training and test data differ in clearlydeﬁned ways. We show that popular models such as Res Nets perform poorly, even when the training and test sets differ only minimally, and we present a novel architecture, with a structure designed to encourage reasoning, that does signiﬁcantly better.
Researcher Affiliation	Industry	David G.T. Barrett * 1 Felix Hill * 1 Adam Santoro * 1 Ari S. Morcos 1 Timothy Lillicrap 1 1DeepMind, London, United Kingdom. Correspondence to: <{barrettdavid; felixhill; adamsantoro}@google.com>.
Pseudocode	No	The paper describes the architecture of models like CNN-MLP, ResNet, LSTM, and WReN in text and uses diagrams (e.g., Figure 3 for WReN), but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	We call our dataset the Procedurally Generated Matrices (PGM) dataset1. ... 1https://github.com/deepmind/abstract-reasoning-matrices
Open Datasets	Yes	We call our dataset the Procedurally Generated Matrices (PGM) dataset1. ... 1https://github.com/deepmind/abstract-reasoning-matrices
Dataset Splits	Yes	For each model, hyper-parameters were chosen using a grid sweep to select the model with smallest loss estimated on a held-out validation set. We used the validation loss for early-stopping and we report performance values on a held-out test set.
Hardware Specification	No	The paper does not mention any specific hardware details such as GPU models (e.g., NVIDIA A100), CPU models (e.g., Intel Core i7), or other detailed computer specifications used for running the experiments.
Software Dependencies	No	The paper mentions algorithms and models like 'ADAM optimiser' (Kingma & Ba, 2014), 'Res Net-50 architecture' (He et al., 2016), and 'standard LSTM module' (Hochreiter & Schmidhuber, 1997), but it does not provide specific software names with version numbers (e.g., TensorFlow 2.x, PyTorch 1.x) that are needed to replicate the experiment.
Experiment Setup	Yes	For each model, hyper-parameters were chosen using a grid sweep to select the model with smallest loss estimated on a held-out validation set. We used the validation loss for early-stopping and we report performance values on a held-out test set. For hyper-parameter settings and further details on all models see appendix A.