Implicit Stochastic Gradient Descent for Training Physics-Informed Neural Networks
Authors: Ye Li, Song-Can Chen, Sheng-Jun Huang
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results demonstrate that ISGD works well in practice and compares favorably to other gradient-based optimization methods such as SGD and Adam, while can also effectively address the numerical stiffness in training dynamics via gradient descent. |
| Researcher Affiliation | Academia | Ye Li, Song-Can Chen, Sheng-Jun Huang College of Computer Science and Technology/Artificial Intelligence, Nanjing University of Aeronautics and Astronautics MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing, China yeli20@nuaa.edu.cn, s.chen@nuaa.edu.cn, huangsj@nuaa.edu.cn |
| Pseudocode | Yes | Algorithm 1 Practical ISGD,Adam optimization for the loss L(θ) with stiff solutions |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. |
| Open Datasets | No | We choose N = 400 randomly sampled points to compute the loss function, a batch size of 40 for a small learning rate α = 0.001, and a full batch size for a large learning rate α = 0.5. and We choose Nb = 400 randomly sampled points on Ω, and Nf = 4, 000 randomly sampled points in Ω to compute the loss function. The paper generates data points for training, rather than using an external public dataset. |
| Dataset Splits | No | We choose N = 400 randomly sampled points to compute the loss function, a batch size of 40 for a small learning rate α = 0.001, and a full batch size for a large learning rate α = 0.5. The paper describes sampling points for loss computation (training) but does not specify a separate validation dataset split. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions optimizers like Adam and L-BFGS, but does not provide specific version numbers for any software, libraries, or programming languages used. |
| Experiment Setup | Yes | The hyperparameters used in the three optimizers are listed in Table 1. We note #Iterations = (K0 K1 + K2) batchs, where K0, K1, K2 are hyper-parameters in Algorithm 1. A neural network with 4 hidden layers, every 50 units with tanh activations, is applied in all the computations. |