Learning to compute Gröbner bases
Authors: Hiroshi Kera, Yuki Ishihara, Yuta Kambe, Tristan Vaccon, Kazuhiro Yokoyama
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiments show that our dataset generation method is a few orders of magnitude faster than a naive approach, overcoming a crucial challenge in learning to compute Gröbner bases, and Gröbner computation is learnable in a particular class. |
| Researcher Affiliation | Collaboration | Hiroshi Kera Chiba University, Zuse Institute Berlin kera@chiba-u.jp; Yuki Ishihara Nihon University ishihara.yuki@nihon-u.ac.jp; Yuta Kambe Mitsubishi Electric kambe.yuta@bx.mitsubishielectric.co.jp; Tristan Vaccon Limoges University tristan.vaccon@unilim.fr; Kazuhiro Yokoyama Rikkyo University kazuhiro@rikkyo.ac.jp |
| Pseudocode | Yes | The combination of the discussion in the previous sections gives an efficient dataset generation algorithm (see Alg. 1 for a pseudocode). |
| Open Source Code | Yes | The code is available at https://github.com/Hiroshi KERA/transformer-groebner. |
| Open Datasets | Yes | We constructed 16 datasets Dn(k) for n {2, 3, 4, 5} and k {F7, F31, Q, R} and measured the runtime of the forward generation and our backward generation. |
| Dataset Splits | No | The training set has one million samples, and the test set has one thousand samples. |
| Hardware Specification | Yes | All the experiments were conducted with 48-core CPUs, 768GB RAM, and NVIDIA RTX A6000ada GPUs. |
| Software Dependencies | No | We used Sage Math [82] with the lib Singular backend. |
| Experiment Setup | Yes | We used a Transformer model [85] with a standard architecture: 6 encoder/decoder layers, 8 attention heads, token embedding dimension of 512 dimensions, and feed-forward networks with 2048 inner dimensions. The absolute positional embedding is learned from scratch. The dropout rate was set to 0.1. We used the Adam W optimizer [65] with (β1, β2) = (0.9, 0.999) with no weight decay. The learning rate was initially set to 10 4 and then linearly decayed over training steps. All training samples are visited in a single epoch, and the total number of epochs was set to 8. The batch size was set to 16. |