reproducibilityindex.ai

Learning to compute Gröbner bases

Authors: Hiroshi Kera, Yuki Ishihara, Yuta Kambe, Tristan Vaccon, Kazuhiro Yokoyama

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experiments show that our dataset generation method is a few orders of magnitude faster than a naive approach, overcoming a crucial challenge in learning to compute Gröbner bases, and Gröbner computation is learnable in a particular class.
Researcher Affiliation	Collaboration	Hiroshi Kera Chiba University, Zuse Institute Berlin kera@chiba-u.jp; Yuki Ishihara Nihon University ishihara.yuki@nihon-u.ac.jp; Yuta Kambe Mitsubishi Electric kambe.yuta@bx.mitsubishielectric.co.jp; Tristan Vaccon Limoges University tristan.vaccon@unilim.fr; Kazuhiro Yokoyama Rikkyo University kazuhiro@rikkyo.ac.jp
Pseudocode	Yes	The combination of the discussion in the previous sections gives an efficient dataset generation algorithm (see Alg. 1 for a pseudocode).
Open Source Code	Yes	The code is available at https://github.com/Hiroshi KERA/transformer-groebner.
Open Datasets	Yes	We constructed 16 datasets Dn(k) for n {2, 3, 4, 5} and k {F7, F31, Q, R} and measured the runtime of the forward generation and our backward generation.
Dataset Splits	No	The training set has one million samples, and the test set has one thousand samples.
Hardware Specification	Yes	All the experiments were conducted with 48-core CPUs, 768GB RAM, and NVIDIA RTX A6000ada GPUs.
Software Dependencies	No	We used Sage Math [82] with the lib Singular backend.
Experiment Setup	Yes	We used a Transformer model [85] with a standard architecture: 6 encoder/decoder layers, 8 attention heads, token embedding dimension of 512 dimensions, and feed-forward networks with 2048 inner dimensions. The absolute positional embedding is learned from scratch. The dropout rate was set to 0.1. We used the Adam W optimizer [65] with (β1, β2) = (0.9, 0.999) with no weight decay. The learning rate was initially set to 10 4 and then linearly decayed over training steps. All training samples are visited in a single epoch, and the total number of epochs was set to 8. The batch size was set to 16.