reproducibilityindex.ai

Programming With a Differentiable Forth Interpreter

Authors: Matko Bošnjak, Tim Rocktäschel, Jason Naradowsky, Sebastian Riedel

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show empirically that our interpreter is able to effectively leverage different levels of prior program structure and learn complex transduction tasks such as sequence sorting or addition with substantially less data and better generalisation over problem sizes. In addition, we introduce neural program optimisations based on symbolic computation and parallel branching that lead to signiﬁcant speed improvements.
Researcher Affiliation	Academia	Matko Boˇsnjak, Tim Rockt aschel, Jason Naradowsky & Sebastian Riedel Department of Computer Science University College London London, UK {m.bosnjak, t.rocktaschel, j.narad, s.riedel}@cs.ucl.ac.uk
Pseudocode	No	The paper includes examples of Forth code (Listing 1 and 2) and mathematical descriptions of differentiable Forth words (Table 4), but does not provide structured pseudocode or algorithm blocks for its overall methodology.
Open Source Code	No	The paper does not provide any concrete access information (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described.
Open Datasets	No	We test 4 on the sorting and addition tasks presented in Reed & de Freitas (2015) with varying levels of program structure. This cites the source of the tasks, but does not explicitly provide access information for the datasets used in their experiments. No specific dataset names like MNIST, CIFAR, etc., are mentioned with links or formal citations.
Dataset Splits	No	The paper states 'Hyperparameters were tuned via random search on a development variant of each task', implying a validation set, but does not provide specific details on dataset splits (e.g., percentages or sample counts) for training, validation, or testing.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions the use of Adam optimizer, but does not provide specific version numbers for any software dependencies, libraries, or programming languages used.
Experiment Setup	Yes	The parameters of each sketch are trained using Adam (Kingma & Ba, 2014), with gradient clipping and gradient noise (Neelakantan et al., 2015b). Hyperparameters were tuned via random search on a development variant of each task, for 1000 epochs, repeating each experiment 5 times. During testing we employ memory element discretisation, replacing differentiable stacks and pointers with their discrete counterparts, and effectively allowing the trained model to generalize to any sequence length if the correct sketch behavior has been learned. To illustrate the generalization ability of this architecture, we compare against a Seq2Seq (Sutskever et al., 2014) baseline. All Seq2Seq models are single-layer, with a hidden size of 50, trained similarly for 1000 epochs using Adam.