Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Unisolver: PDE-Conditional Transformers Towards Universal Neural PDE Solvers

Authors: Hang Zhou, Yuezhou Ma, Haixu Wu, Haowen Wang, Mingsheng Long

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments on our own generated dataset and two large-scale benchmarks with various PDE components, where Unisolver achieves consistent state-of-the-art with sharp relative gains.
Researcher Affiliation	Academia	1School of Software, BNRist, Tsinghua University, China. Hang Zhou<EMAIL>. Correspondence to: Haixu Wu <EMAIL>, Mingsheng Long <EMAIL>.
Pseudocode	No	The paper includes Equation (2) which formalizes the n-th layer of Unisolver, but it is not explicitly labeled as "Pseudocode" or "Algorithm". There are no other clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/thuml/Unisolver.
Open Datasets	Yes	We conduct extensive experiments on our own generated dataset and two large-scale benchmarks with various PDE components... The Heter NS is an extension of the NS dataset from FNO (2021a)... The 1D time-dependent PDEs, introduced by PDEformer (2024)... The 2D mixed PDEs, collected by DPOT (2024)... The dataset can be accessed at the following anonymous link.2 https://drive.google.com/drive/folders/1te5IyQHTznu_Kw7v3zDHg0i_KCHysPKw?usp=share_link
Dataset Splits	Yes	For each combination, we generate 1000 samples, yielding a total of 15,000 training samples. The remaining 200 instances are used for testing its performance. In in-distribution tests, the initial conditions vary across samples. Zero-shot generalization settings present much greater challenges... We assess the model s zero-shot performance on 200 samples.
Hardware Specification	Yes	Table 2. Summary of benchmarks. #GPU hours are calculated by averaging the training time of all models on one A100 GPU... Our models were trained on servers with 32 NVIDIA A100 GPUs, each with 40GB memory.
Software Dependencies	No	The paper mentions using the ADAM optimizer (Kingma & Ba, 2015) and a cosine annealing learning rate scheduler (Loshchilov & Hutter, 2016), and the LLaMA-3 8B model. However, it does not provide specific version numbers for these or for the main deep learning framework (e.g., PyTorch, TensorFlow) or programming language used.
Experiment Setup	Yes	All methods in the Heter NS benchmark are trained for 300 epochs using relative L2 loss and the ADAM optimizer (Kingma & Ba, 2015) with an initial learning rate of 0.0005 and a cosine annealing learning rate scheduler (Loshchilov & Hutter, 2016). The batch size is set to 60. For the 1D time-dependent PDEs and 2D mixed PDEs, we follow the training strategies from the original papers of PDEformer (2024) and DPOT (2024) to ensure a fair comparison. Relative L2 is used as the evaluation metric. See Appendix H for full implementation details and hyper-parameter configurations of each model.