Deep Learning For Symbolic Mathematics
Authors: Guillaume Lample, François Charton
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose a syntax for representing mathematical problems, and methods for generating large datasets that can be used to train sequence-to-sequence models. We achieve results that outperform commercial Computer Algebra Systems such as Matlab or Mathematica. |
| Researcher Affiliation | Industry | Guillaume Lample Facebook AI Research glample@fb.com Franc ois Charton Facebook AI Research fcharton@fb.com |
| Pseudocode | Yes | In Section C of the appendix, we present an algorithm to generate random trees and expressions, where the four expression trees above are all generated with the same probability. |
| Open Source Code | No | The paper does not explicitly state that source code for their methodology is provided or publicly available. |
| Open Datasets | No | To train our networks, we need datasets of problems and solutions. Ideally, we want to generate representative samples of the problem space, i.e. randomly generate functions to be integrated and differential equations to be solved. |
| Dataset Splits | No | While the paper mentions 'Training set size' in Table 1 and discusses 'held-out test samples', it does not provide explicit details on training, validation, and test splits (e.g., percentages or counts) or refer to standard predefined splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory amounts) used for running its experiments. It only mentions the software used for comparison. |
| Software Dependencies | Yes | All experiments were run with Mathematica 12.0.0.0, Maple 2019 and Matlab R2019a. |
| Experiment Setup | Yes | We use a transformer model (Vaswani et al., 2017) with 8 attention heads, 6 layers, and a dimensionality of 512. In our experiences, using larger models did not improve the performance. We train our models with the Adam optimizer (Kingma & Ba, 2014), with a learning rate of 10 4. We remove expressions with more than 512 tokens, and train our model with 256 equations per batch. |