Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
PINNsAgent: Automated PDE Surrogation with Large Language Models
Authors: Qingpo Wuwu, Chonghan Gao, Tianyu Chen, Yihang Huang, Yuekai Zhang, Jianing Wang, Jianxin Li, Haoyi Zhou, Shanghang Zhang
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate PINNs Agent on 14 benchmark PDEs, demonstrating its effectiveness in automating the surrogation process and significantly improving the accuracy of PINNsbased solutions. Project website: https:// qingpowuwu.github.io/PINNs Agent/. [...] The comparative end-to-end performance of PINNs Agent and the baseline approaches on 14 different PDEs is presented in Table 2. [...] We conducted an ablation study by removing these components individually and comparing the performance with the complete framework. |
| Researcher Affiliation | Academia | 1State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University 2School of Computer Science, Beihang University 3School of Artificial Intelligence, Beijing Normal University. Correspondence to: Shanghang Zhang <EMAIL>. |
| Pseudocode | Yes | C. Pseudocodes Algorithm 1 Physics-Guided Knowledge Replay (PGKR) for a New PDE |
| Open Source Code | No | The paper states "Project website: https:// qingpowuwu.github.io/PINNs Agent/". This is a project website, not an explicit code repository link or statement of code release for the methodology described in the paper. |
| Open Datasets | Yes | We leverage the PINNacle benchmark dataset (Hao et al., 2024), a comprehensive collection of 20 representative PDEs spanning 1D, 2D, and 3D domains. |
| Dataset Splits | No | The paper mentions using the PINNacle benchmark dataset and conducting ten repeated experiments for each PDE, but it does not specify how the data for a single PDE problem (e.g., solution data) is split into training, validation, or test sets. PINNs solve PDEs directly and the concept of traditional data splits for the PDE solution itself is not explicitly provided. |
| Hardware Specification | Yes | Note that we have only a GPU with 24GB of memory. |
| Software Dependencies | No | The paper mentions implementing PINNs Agent with the "GPT-4 model" and using a "YAML configuration files" but does not specify version numbers for any libraries or programming languages used for the experimental setup. |
| Experiment Setup | Yes | We extend the hyperparameter search space defined by (Wang et al., 2022; Wang & Zhong, 2024), with additional hyperparameters carefully curated from previous hyperparameter optimization (HPO) works (Klein & Hutter, 2019) to provide a more comprehensive exploration of the architectural landscape of PINNs. The configuration space, shown in Table 1, encompasses 4 architectural choices: network type, activation functions, width, and depth, along with 5 hyperparameters: optimizer, initializer, learning rate, loss weight coefficients, and domain/boundary/initial points. [...] For each PDE, we conducted ten repeated experiments and took the average of the lowest MSE to mitigate randomness, with a temperature of 0.7. We compare PINNs Agent with two baseline methods: (1) Random Search, a basic hyperparameter tuning method that selects configurations randomly, and (2) Bayesian Search, which uses Bayesian optimization to select configurations. |