Dual Cone Gradient Descent for Training Physics-Informed Neural Networks

Authors: Youngsik Hwang, Dongyoung Lim

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On a variety of benchmark equations, we demonstrate that DCGD outperforms other optimization algorithms in terms of various evaluation metrics. In particular, DCGD achieves superior predictive accuracy and enhances the stability of training for failure modes of PINNs and complex PDEs, compared to existing optimally tuned models. Moreover, DCGD can be further improved by combining it with popular strategies for PINNs, including learning rate annealing and the Neural Tangent Kernel (NTK).
Researcher Affiliation Academia Youngsik Hwang Artificial Intelligence Graduate School UNIST hys3835@unist.ac.kr Dong-Young Lim Department of Industrial Engineering Artificial Intelligence Graduate School UNIST dlim@unist.ac.kr
Pseudocode Yes A general framework for DCGD is presented in Algo 1. ... The visualization of these three algorithms can be found in Figure 4 and their pseudocodes are provided in Appendix E.
Open Source Code Yes Codes are available at https://github.com/youngsikhwang/Dual-Cone-Gradient-Descent.
Open Datasets No Unless the equation has an analytic solution, we use the numerical reference solution for u(x), which solved by finite element method [1]. The paper defines the PDE problems and how data points are sampled or generated but does not provide a link to a fixed, publicly available dataset in the traditional sense.
Dataset Splits No The paper describes random sampling of points for training but does not specify a distinct validation set split or methodology for it.
Hardware Specification Yes We conduct all experiments with PYTHON 3.10.9 and PYTORCH 1.13.1, CUDA 11.6.2, NVIDIA Driver 510.10 on Ubuntu 22.04.1 LTS server which equipped with AMD Ryzen Threadripper PRO 5975WX, NVIDIA A100 80GB and NVIDIA RTX A6000.
Software Dependencies Yes We conduct all experiments with PYTHON 3.10.9 and PYTORCH 1.13.1, CUDA 11.6.2, NVIDIA Driver 510.10 on Ubuntu 22.04.1 LTS server.
Experiment Setup Yes We employ a 3-layer fully connected neural network with 50 neurons per layer and use the hyperbolic tangent activation function for all experiments in Section 5.1. At each iteration, 128 points are randomly sampled in boundaries and 10 times more points in the domain as the collocation points. ... We train PINNs for 50,000 epochs with Glorot normal initialization [48] using DCGD algorithms, ADAM [45], LRA [30], NTK [31], PCGrad [37], Multi Adam [34], and DPM [35]. We search for the initial learning rate among λ = {10-3, 10-4, 10-5} and use a exponential decay scheduler with a decay rate of 0.9 and a decay step = 1,000. For ADAM, we use the default parameters: β1 = 0.9, β2 = 0.999, ϵ = 10-8 as in [45]. For LRA, we set α = 0.1, which is the best hyperparameter reported in [30]. For Multi Adam, we use β1, β2 = 0.99 as recommended in [34]. For DPM, we test δ = {10-1, 10-2, 10-3}, ϵ = {10-1, 10-2, 10-3}, w = {1, 1.01, 1.001}.