Deep Learning For Symbolic Mathematics

Authors: Guillaume Lample, François Charton

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose a syntax for representing mathematical problems, and methods for generating large datasets that can be used to train sequence-to-sequence models. We achieve results that outperform commercial Computer Algebra Systems such as Matlab or Mathematica.
Researcher Affiliation Industry Guillaume Lample Facebook AI Research glample@fb.com Franc ois Charton Facebook AI Research fcharton@fb.com
Pseudocode Yes In Section C of the appendix, we present an algorithm to generate random trees and expressions, where the four expression trees above are all generated with the same probability.
Open Source Code No The paper does not explicitly state that source code for their methodology is provided or publicly available.
Open Datasets No To train our networks, we need datasets of problems and solutions. Ideally, we want to generate representative samples of the problem space, i.e. randomly generate functions to be integrated and differential equations to be solved.
Dataset Splits No While the paper mentions 'Training set size' in Table 1 and discusses 'held-out test samples', it does not provide explicit details on training, validation, and test splits (e.g., percentages or counts) or refer to standard predefined splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU types, memory amounts) used for running its experiments. It only mentions the software used for comparison.
Software Dependencies Yes All experiments were run with Mathematica 12.0.0.0, Maple 2019 and Matlab R2019a.
Experiment Setup Yes We use a transformer model (Vaswani et al., 2017) with 8 attention heads, 6 layers, and a dimensionality of 512. In our experiences, using larger models did not improve the performance. We train our models with the Adam optimizer (Kingma & Ba, 2014), with a learning rate of 10 4. We remove expressions with more than 512 tokens, and train our model with 256 equations per batch.