Learning Selection Strategies in Buchberger’s Algorithm
Authors: Dylan Peifer, Michael Stillman, Daniel Halpern-Leistner
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce a new approach to Buchberger s algorithm that uses reinforcement learning agents to perform S-pair selection, a key step in the algorithm. We then study how the diffculty of the problem depends on the choices of domain and distribution of poly nomials, about which little is known. Finally, we train a policy model using proximal policy opti mization (PPO) to learn S-pair selection strategies for random systems of binomial equations. In certain domains, the trained model outperforms state-of-the-art selection heuristics in total num ber of polynomial additions performed, which provides a proof-of-concept that recent develop ments in machine learning have the potential to improve performance of algorithms in symbolic computation. |
| Researcher Affiliation | Academia | Dylan Peifer 1 Michael Stillman 1 Daniel Halpern-Leistner 1 1Department of Mathematics, Cornell University, Ithaca, NY, USA. Correspondence to: Daniel Halpern-Leistner <daniel.hl@cornell.edu>. |
| Pseudocode | Yes | Algorithm 1 Multivariate Division Algorithm ... Algorithm 2 Buchberger s Algorithm |
| Open Source Code | No | The paper does not provide a link to its own source code, nor does it explicitly state that its code is being released. |
| Open Datasets | No | There are no fxed train or test sets. Instead, the training and testing data are generated online by a function that builds an ideal at random from the distribution. |
| Dataset Splits | No | The paper states: "There are no fxed train or test sets. Instead, the training and testing data are generated online by a function that builds an ideal at random from the distribution." It does not explicitly mention distinct training, validation, and test splits with percentages or sample counts, but rather an online generation process. |
| Hardware Specification | Yes | Training times were 16 to 48 hours each on a c5n.xlarge instance through Amazon Web Services. |
| Software Dependencies | No | The paper mentions using "proximal policy optimization (PPO) (Schulman et al., 2017) using a custom implemen tation inspired by (Achiam, 2018)" but does not specify version numbers for any software libraries or dependencies used in their implementation. |
| Experiment Setup | Yes | In each epoch we frst sample 100 episodes following the current policy. Next, GAE with λ = 0.97 and γ = 0.99 is used to compute ad vantages, which are normalized over the epoch. Finally, we perform at most 80 gradient updates on the clipped surrogate PPO objective with ϵ = 0.2 using Adam optimization with learning rate 0.0001. Early-stopping is performed when the sampled KL-divergence from the last policy exceeds 0.01. |