Dual Cone Gradient Descent for Training Physics-Informed Neural Networks
Authors: Youngsik Hwang, Dongyoung Lim
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On a variety of benchmark equations, we demonstrate that DCGD outperforms other optimization algorithms in terms of various evaluation metrics. In particular, DCGD achieves superior predictive accuracy and enhances the stability of training for failure modes of PINNs and complex PDEs, compared to existing optimally tuned models. Moreover, DCGD can be further improved by combining it with popular strategies for PINNs, including learning rate annealing and the Neural Tangent Kernel (NTK). |
| Researcher Affiliation | Academia | Youngsik Hwang Artificial Intelligence Graduate School UNIST hys3835@unist.ac.kr Dong-Young Lim Department of Industrial Engineering Artificial Intelligence Graduate School UNIST dlim@unist.ac.kr |
| Pseudocode | Yes | A general framework for DCGD is presented in Algo 1. ... The visualization of these three algorithms can be found in Figure 4 and their pseudocodes are provided in Appendix E. |
| Open Source Code | Yes | Codes are available at https://github.com/youngsikhwang/Dual-Cone-Gradient-Descent. |
| Open Datasets | No | Unless the equation has an analytic solution, we use the numerical reference solution for u(x), which solved by finite element method [1]. The paper defines the PDE problems and how data points are sampled or generated but does not provide a link to a fixed, publicly available dataset in the traditional sense. |
| Dataset Splits | No | The paper describes random sampling of points for training but does not specify a distinct validation set split or methodology for it. |
| Hardware Specification | Yes | We conduct all experiments with PYTHON 3.10.9 and PYTORCH 1.13.1, CUDA 11.6.2, NVIDIA Driver 510.10 on Ubuntu 22.04.1 LTS server which equipped with AMD Ryzen Threadripper PRO 5975WX, NVIDIA A100 80GB and NVIDIA RTX A6000. |
| Software Dependencies | Yes | We conduct all experiments with PYTHON 3.10.9 and PYTORCH 1.13.1, CUDA 11.6.2, NVIDIA Driver 510.10 on Ubuntu 22.04.1 LTS server. |
| Experiment Setup | Yes | We employ a 3-layer fully connected neural network with 50 neurons per layer and use the hyperbolic tangent activation function for all experiments in Section 5.1. At each iteration, 128 points are randomly sampled in boundaries and 10 times more points in the domain as the collocation points. ... We train PINNs for 50,000 epochs with Glorot normal initialization [48] using DCGD algorithms, ADAM [45], LRA [30], NTK [31], PCGrad [37], Multi Adam [34], and DPM [35]. We search for the initial learning rate among λ = {10-3, 10-4, 10-5} and use a exponential decay scheduler with a decay rate of 0.9 and a decay step = 1,000. For ADAM, we use the default parameters: β1 = 0.9, β2 = 0.999, ϵ = 10-8 as in [45]. For LRA, we set α = 0.1, which is the best hyperparameter reported in [30]. For Multi Adam, we use β1, β2 = 0.99 as recommended in [34]. For DPM, we test δ = {10-1, 10-2, 10-3}, ϵ = {10-1, 10-2, 10-3}, w = {1, 1.01, 1.001}. |