Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
PhAST: Physics-Aware, Scalable, and Task-Specific GNNs for Accelerated Catalyst Design
Authors: Alexandre Duval, Victor Schmidt, Santiago Miret, Yoshua Bengio, Alex Hernández-García, David Rolnick
JMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we propose task-specific innovations applicable to most architectures, enhancing both computational efficiency and accuracy. This includes improvements in (1) the graph creation step, (2) atom representations, (3) the energy prediction head, and (4) the force prediction head. We describe these contributions, referred to as Ph AST, and evaluate them thoroughly on multiple architectures. Overall, Ph AST improves energy MAE by 4 to 42% while dividing compute time by 3 to 8 depending on the targeted task/model. |
| Researcher Affiliation | Collaboration | Alexandre Duval* EMAIL Mila, Inria, Centrale Supelec; Victor Schmidt* EMAIL Mila, Universtité de Montréal; Santiago Miret EMAIL Intel Labs; Yoshua Bengio EMAIL Mila, Université de Montréal, CIFAR Fellow; Alex Hernández-García EMAIL Mila, Université de Montréal; David Rolnick EMAIL Mila, Mc Gill University. |
| Pseudocode | No | The paper describes methods like graph creation, atom embeddings, and energy/force heads in detail within sections 3.1, 3.2, 3.3, and 3.4. However, it does not present any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures. |
| Open Source Code | Yes | Python package: https://phast.readthedocs.io. |
| Open Datasets | Yes | To enable to use of ML for catalyst discovery, the Open Catalyst Project released OC20 (Chanussot et al., 2021), a large data set... |
| Dataset Splits | Yes | It comes with a pre-defined train/val/test split, 450,000 training samples and hidden test labels. Experiments are evaluated on the validation set, which has four splits of 25K samples: In Domain (ID), Out of Domain adsorbates (OOD-ads), Out of Domain catalysts (OOD-cat), and Out of Domain adsorbates and catalysts (OOD-both). |
| Hardware Specification | Yes | We performed our training on fourth generation Intel Xeon Scalable processors (named Sapphire Rapids SPR) and investigated the scalibility and compute of Ph AST across multiple CPU nodes. |
| Software Dependencies | No | To implement Ph AST CPU training, we leverage the Open Mat Sci ML Toolkit by Miret et al. (2022) which provides a unified platform for training deep learning on the Open Catalyst dataset across different hardware platforms. Additionally, Miret et al. (2022) utilize the Deep Graph Library (DGL) as the platform for GNN development, which provides an additional proofpoint given that all prior experiments were performed using Py Torch Geometric. |
| Experiment Setup | Yes | We use the hyperparameters, training settings and model architectures provided in the original papers. The only change is the smaller number of epochs used for Dime Net++ (10 instead of 20) and Sch Net (20 instead of 30), as these additional epochs only lead to a small performance gain for a large amount of additional compute time. For Ph AST models, we fine-tuned its hyperpameters to reach optimal performance. |