Learning Selection Strategies in Buchberger’s Algorithm

Authors: Dylan Peifer, Michael Stillman, Daniel Halpern-Leistner

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We introduce a new approach to Buchberger s algorithm that uses reinforcement learning agents to perform S-pair selection, a key step in the algorithm. We then study how the diffculty of the problem depends on the choices of domain and distribution of poly nomials, about which little is known. Finally, we train a policy model using proximal policy opti mization (PPO) to learn S-pair selection strategies for random systems of binomial equations. In certain domains, the trained model outperforms state-of-the-art selection heuristics in total num ber of polynomial additions performed, which provides a proof-of-concept that recent develop ments in machine learning have the potential to improve performance of algorithms in symbolic computation.
Researcher Affiliation Academia Dylan Peifer 1 Michael Stillman 1 Daniel Halpern-Leistner 1 1Department of Mathematics, Cornell University, Ithaca, NY, USA. Correspondence to: Daniel Halpern-Leistner <daniel.hl@cornell.edu>.
Pseudocode Yes Algorithm 1 Multivariate Division Algorithm ... Algorithm 2 Buchberger s Algorithm
Open Source Code No The paper does not provide a link to its own source code, nor does it explicitly state that its code is being released.
Open Datasets No There are no fxed train or test sets. Instead, the training and testing data are generated online by a function that builds an ideal at random from the distribution.
Dataset Splits No The paper states: "There are no fxed train or test sets. Instead, the training and testing data are generated online by a function that builds an ideal at random from the distribution." It does not explicitly mention distinct training, validation, and test splits with percentages or sample counts, but rather an online generation process.
Hardware Specification Yes Training times were 16 to 48 hours each on a c5n.xlarge instance through Amazon Web Services.
Software Dependencies No The paper mentions using "proximal policy optimization (PPO) (Schulman et al., 2017) using a custom implemen tation inspired by (Achiam, 2018)" but does not specify version numbers for any software libraries or dependencies used in their implementation.
Experiment Setup Yes In each epoch we frst sample 100 episodes following the current policy. Next, GAE with λ = 0.97 and γ = 0.99 is used to compute ad vantages, which are normalized over the epoch. Finally, we perform at most 80 gradient updates on the clipped surrogate PPO objective with ϵ = 0.2 using Adam optimization with learning rate 0.0001. Early-stopping is performed when the sampled KL-divergence from the last policy exceeds 0.01.