Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

PINNsAgent: Automated PDE Surrogation with Large Language Models

Authors: Qingpo Wuwu, Chonghan Gao, Tianyu Chen, Yihang Huang, Yuekai Zhang, Jianing Wang, Jianxin Li, Haoyi Zhou, Shanghang Zhang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate PINNs Agent on 14 benchmark PDEs, demonstrating its effectiveness in automating the surrogation process and significantly improving the accuracy of PINNsbased solutions. Project website: https:// qingpowuwu.github.io/PINNs Agent/. [...] The comparative end-to-end performance of PINNs Agent and the baseline approaches on 14 different PDEs is presented in Table 2. [...] We conducted an ablation study by removing these components individually and comparing the performance with the complete framework.
Researcher Affiliation	Academia	1State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University 2School of Computer Science, Beihang University 3School of Artificial Intelligence, Beijing Normal University. Correspondence to: Shanghang Zhang <EMAIL>.
Pseudocode	Yes	C. Pseudocodes Algorithm 1 Physics-Guided Knowledge Replay (PGKR) for a New PDE
Open Source Code	No	The paper states "Project website: https:// qingpowuwu.github.io/PINNs Agent/". This is a project website, not an explicit code repository link or statement of code release for the methodology described in the paper.
Open Datasets	Yes	We leverage the PINNacle benchmark dataset (Hao et al., 2024), a comprehensive collection of 20 representative PDEs spanning 1D, 2D, and 3D domains.
Dataset Splits	No	The paper mentions using the PINNacle benchmark dataset and conducting ten repeated experiments for each PDE, but it does not specify how the data for a single PDE problem (e.g., solution data) is split into training, validation, or test sets. PINNs solve PDEs directly and the concept of traditional data splits for the PDE solution itself is not explicitly provided.
Hardware Specification	Yes	Note that we have only a GPU with 24GB of memory.
Software Dependencies	No	The paper mentions implementing PINNs Agent with the "GPT-4 model" and using a "YAML configuration files" but does not specify version numbers for any libraries or programming languages used for the experimental setup.
Experiment Setup	Yes	We extend the hyperparameter search space defined by (Wang et al., 2022; Wang & Zhong, 2024), with additional hyperparameters carefully curated from previous hyperparameter optimization (HPO) works (Klein & Hutter, 2019) to provide a more comprehensive exploration of the architectural landscape of PINNs. The configuration space, shown in Table 1, encompasses 4 architectural choices: network type, activation functions, width, and depth, along with 5 hyperparameters: optimizer, initializer, learning rate, loss weight coefficients, and domain/boundary/initial points. [...] For each PDE, we conducted ten repeated experiments and took the average of the lowest MSE to mitigate randomness, with a temperature of 0.7. We compare PINNs Agent with two baseline methods: (1) Random Search, a basic hyperparameter tuning method that selects configurations randomly, and (2) Bayesian Search, which uses Bayesian optimization to select configurations.