Learning to compute Gröbner bases

Authors: Hiroshi Kera, Yuki Ishihara, Yuta Kambe, Tristan Vaccon, Kazuhiro Yokoyama

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experiments show that our dataset generation method is a few orders of magnitude faster than a naive approach, overcoming a crucial challenge in learning to compute Gröbner bases, and Gröbner computation is learnable in a particular class.
Researcher Affiliation Collaboration Hiroshi Kera Chiba University, Zuse Institute Berlin kera@chiba-u.jp; Yuki Ishihara Nihon University ishihara.yuki@nihon-u.ac.jp; Yuta Kambe Mitsubishi Electric kambe.yuta@bx.mitsubishielectric.co.jp; Tristan Vaccon Limoges University tristan.vaccon@unilim.fr; Kazuhiro Yokoyama Rikkyo University kazuhiro@rikkyo.ac.jp
Pseudocode Yes The combination of the discussion in the previous sections gives an efficient dataset generation algorithm (see Alg. 1 for a pseudocode).
Open Source Code Yes The code is available at https://github.com/Hiroshi KERA/transformer-groebner.
Open Datasets Yes We constructed 16 datasets Dn(k) for n {2, 3, 4, 5} and k {F7, F31, Q, R} and measured the runtime of the forward generation and our backward generation.
Dataset Splits No The training set has one million samples, and the test set has one thousand samples.
Hardware Specification Yes All the experiments were conducted with 48-core CPUs, 768GB RAM, and NVIDIA RTX A6000ada GPUs.
Software Dependencies No We used Sage Math [82] with the lib Singular backend.
Experiment Setup Yes We used a Transformer model [85] with a standard architecture: 6 encoder/decoder layers, 8 attention heads, token embedding dimension of 512 dimensions, and feed-forward networks with 2048 inner dimensions. The absolute positional embedding is learned from scratch. The dropout rate was set to 0.1. We used the Adam W optimizer [65] with (β1, β2) = (0.9, 0.999) with no weight decay. The learning rate was initially set to 10 4 and then linearly decayed over training steps. All training samples are visited in a single epoch, and the total number of epochs was set to 8. The batch size was set to 16.