Mathematical Reasoning in Latent Space

Authors: Dennis Lee, Christian Szegedy, Markus Rabe, Sarah Loos, Kshitij Bansal

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We design and conduct a simple experiment to study whether neural networks can perform several steps of approximate reasoning in a fixed dimensional latent space. Our experiments show that graph neural networks can make non-trivial predictions about the rewrite-success of statements, even when they propagate predicted latent representations for several steps.
Researcher Affiliation Industry Dennis Lee, Christian Szegedy, Markus N. Rabe, Kshitij Bansal, and Sarah M. Loos Google Research Mountain View, CA, USA {ldennis,szegedy,mrabe,kbk,smloos}@google.com
Pseudocode No The paper describes methods textually and with a diagram (Figure 1 and the schema on page 4) but does not include formal pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement or link indicating that the source code for the methodology described in this paper is publicly available.
Open Datasets Yes We generate the training data from the theorem database of the HOList environment (Bansal et al., 2019b), which contains 19591 theorems and definitions ordered such that later theorems use only earlier statements in their human proof.
Dataset Splits Yes The theorems are split into 11655 training, 3668 validation, and 3620 testing theorems.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with their versions) that would be needed to replicate the experiment.
Experiment Setup Yes Our network has two independent towers, which are each 16-hop graph neural networks with internal node representations in R128. The output of each of the two towers is fed into a layer that expands the dimension of the node representation to R1024 with a fully connected layer with shared weights for each node. [...] We add Gaussian noise with β = 10 3 to the embedding of the goal theorem. [...] and are processed by a three layer perceptron with 1024 hidden units and rectified linear activation between the layers. Our model is trained with g = 16 choices of T in each batch.