Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Sparse Approximate Inverse Preconditioners for Conjugate Gradient Solvers on GPUs
Authors: Zhehao Li, Kangbo Lyu, Yixuan Li, Tao Du, Ligang Liu
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluations on three PDE-derived datasets and one synthetic dataset demonstrate that our method outperforms standard preconditioners (Diagonal, IC, and traditional SPAI) and previous learning-based preconditioners on GPUs. We reduce solution time on GPUs by 40%-53% (68%-113% faster), along with better condition numbers and superior generalization performance. |
| Researcher Affiliation | Academia | 1University of Science and Technology of China 2Stanford University 3Tsinghua University 4Shanghai Qi Zhi Institute |
| Pseudocode | Yes | Algorithm 1 Preconditioned Conjugate Gradient |
| Open Source Code | Yes | Yes, the code will be made publicly available upon publication, along with detailed instructions to reproduce all main experimental results. |
| Open Datasets | Yes | For Heat and Poisson problems, we use 9,147 meshes with node counts ranging from 400 to 32,000 from the Tet Wild dataset [28]. |
| Dataset Splits | Yes | All experiments employ a 4:1 train-test split, with all reported results evaluated on the test set. |
| Hardware Specification | Yes | All models are trained for 500 epochs using a batch size of 4 on a single NVIDIA A100 GPU, optimized with Adam W [29] and an exponentially decaying learning rate scheduler (decay rate = 0.99). All evaluations are performed on an AMD Ryzen 5 5600 CPU and an NVIDIA Ge Force RTX 3060 GPU. |
| Software Dependencies | Yes | All the preconditioned CG are implemented in C++ and CUDA with Open BLAS [30], cu BLAS, cu SPARSE [31], and cusplibrary [32] for their high-performance linear algebra kernels and preconditioner implementations. The source code is compiled using GCC 14.2 and CUDA 12.8. Table 7: Implementation of baseline preconditioners CPU Eigen [36] GPU Custom cu SPARSE cusplibrary [32] |
| Experiment Setup | Yes | For all experiments in this work, we fixed the number of message passing steps L to 4, the number of hidden layers in all the MLPs to 1, the number of neurons d in the hidden layer to 24, and ε = 10 4. The GNN has about 24k trainable parameters in total. All models are trained for 500 epochs using a batch size of 4 on a single NVIDIA A100 GPU, optimized with Adam W [29] and an exponentially decaying learning rate scheduler (decay rate = 0.99). |