Mathematical Reasoning in Latent Space
Authors: Dennis Lee, Christian Szegedy, Markus Rabe, Sarah Loos, Kshitij Bansal
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We design and conduct a simple experiment to study whether neural networks can perform several steps of approximate reasoning in a fixed dimensional latent space. Our experiments show that graph neural networks can make non-trivial predictions about the rewrite-success of statements, even when they propagate predicted latent representations for several steps. |
| Researcher Affiliation | Industry | Dennis Lee, Christian Szegedy, Markus N. Rabe, Kshitij Bansal, and Sarah M. Loos Google Research Mountain View, CA, USA {ldennis,szegedy,mrabe,kbk,smloos}@google.com |
| Pseudocode | No | The paper describes methods textually and with a diagram (Figure 1 and the schema on page 4) but does not include formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the methodology described in this paper is publicly available. |
| Open Datasets | Yes | We generate the training data from the theorem database of the HOList environment (Bansal et al., 2019b), which contains 19591 theorems and definitions ordered such that later theorems use only earlier statements in their human proof. |
| Dataset Splits | Yes | The theorems are split into 11655 training, 3668 validation, and 3620 testing theorems. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with their versions) that would be needed to replicate the experiment. |
| Experiment Setup | Yes | Our network has two independent towers, which are each 16-hop graph neural networks with internal node representations in R128. The output of each of the two towers is fed into a layer that expands the dimension of the node representation to R1024 with a fully connected layer with shared weights for each node. [...] We add Gaussian noise with β = 10 3 to the embedding of the goal theorem. [...] and are processed by a three layer perceptron with 1024 hidden units and rectified linear activation between the layers. Our model is trained with g = 16 choices of T in each batch. |