reproducibilityindex.ai

Analysing Mathematical Reasoning Abilities of Neural Models

Authors: David Saxton, Edward Grefenstette, Felix Hill, Pushmeet Kohli

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	we conduct a comprehensive analysis of models from two broad classes of the most powerful sequence-to-sequence architectures and ﬁnd notable differences in their ability to resolve mathematical problems and generalize their knowledge.
Researcher Affiliation	Industry	David Saxton Deep Mind saxton@google.comEdward Grefenstette Deep Mind egrefen@fb.comFelix Hill Deep Mind felixhill@google.comPushmeet Kohli Deep Mind pushmeet@google.com
Pseudocode	No	The paper describes the models examined and their architectures but does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	We release1 a sequence-to-sequence dataset consisting of many different types of mathematics questions (see Figure 1) for measuring mathematical reasoning, with the provision of both generation code and pre-generated questions. 1Dataset will be available at https://github.com/deepmind/mathematics_dataset
Open Datasets	Yes	Dataset and generalization tests We release1 a sequence-to-sequence dataset consisting of many different types of mathematics questions (see Figure 1) for measuring mathematical reasoning, with the provision of both generation code and pre-generated questions. 1Dataset will be available at https://github.com/deepmind/mathematics_dataset
Dataset Splits	No	The paper states 'Per module, we generate 2 10^6 train questions, and 10^5 test (interpolation) questions.' and mentions 'validation performance' but does not specify the size or methodology of a validation split.
Hardware Specification	Yes	We use a batch size of 1024 split across 8 NVIDIA P100 GPUs for 500k batches
Software Dependencies	No	The paper mentions tools like Python/Sym Py and the Adam optimizer but does not provide specific version numbers for software dependencies.
Experiment Setup	Yes	We minimize the sum of log probabilities of the correct character via the Adam optimizer (Kingma & Ba, 2014) with learning rate of 6 10 4, β1 = 0.9, β2 = 0.995, ϵ = 10 9. We use a batch size of 1024 split across 8 NVIDIA P100 GPUs for 500k batches, with absolute gradient value clipping of 0.1.