Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning Sparse Approximate Inverse Preconditioners for Conjugate Gradient Solvers on GPUs

Authors: Zhehao Li, Kangbo Lyu, Yixuan Li, Tao Du, Ligang Liu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluations on three PDE-derived datasets and one synthetic dataset demonstrate that our method outperforms standard preconditioners (Diagonal, IC, and traditional SPAI) and previous learning-based preconditioners on GPUs. We reduce solution time on GPUs by 40%-53% (68%-113% faster), along with better condition numbers and superior generalization performance.
Researcher Affiliation Academia 1University of Science and Technology of China 2Stanford University 3Tsinghua University 4Shanghai Qi Zhi Institute
Pseudocode Yes Algorithm 1 Preconditioned Conjugate Gradient
Open Source Code Yes Yes, the code will be made publicly available upon publication, along with detailed instructions to reproduce all main experimental results.
Open Datasets Yes For Heat and Poisson problems, we use 9,147 meshes with node counts ranging from 400 to 32,000 from the Tet Wild dataset [28].
Dataset Splits Yes All experiments employ a 4:1 train-test split, with all reported results evaluated on the test set.
Hardware Specification Yes All models are trained for 500 epochs using a batch size of 4 on a single NVIDIA A100 GPU, optimized with Adam W [29] and an exponentially decaying learning rate scheduler (decay rate = 0.99). All evaluations are performed on an AMD Ryzen 5 5600 CPU and an NVIDIA Ge Force RTX 3060 GPU.
Software Dependencies Yes All the preconditioned CG are implemented in C++ and CUDA with Open BLAS [30], cu BLAS, cu SPARSE [31], and cusplibrary [32] for their high-performance linear algebra kernels and preconditioner implementations. The source code is compiled using GCC 14.2 and CUDA 12.8. Table 7: Implementation of baseline preconditioners CPU Eigen [36] GPU Custom cu SPARSE cusplibrary [32]
Experiment Setup Yes For all experiments in this work, we fixed the number of message passing steps L to 4, the number of hidden layers in all the MLPs to 1, the number of neurons d in the hidden layer to 24, and ε = 10 4. The GNN has about 24k trainable parameters in total. All models are trained for 500 epochs using a batch size of 4 on a single NVIDIA A100 GPU, optimized with Adam W [29] and an exponentially decaying learning rate scheduler (decay rate = 0.99).