Implicit Stochastic Gradient Descent for Training Physics-Informed Neural Networks

Authors: Ye Li, Song-Can Chen, Sheng-Jun Huang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results demonstrate that ISGD works well in practice and compares favorably to other gradient-based optimization methods such as SGD and Adam, while can also effectively address the numerical stiffness in training dynamics via gradient descent.
Researcher Affiliation Academia Ye Li, Song-Can Chen, Sheng-Jun Huang College of Computer Science and Technology/Artificial Intelligence, Nanjing University of Aeronautics and Astronautics MIIT Key Laboratory of Pattern Analysis and Machine Intelligence, Nanjing, China yeli20@nuaa.edu.cn, s.chen@nuaa.edu.cn, huangsj@nuaa.edu.cn
Pseudocode Yes Algorithm 1 Practical ISGD,Adam optimization for the loss L(θ) with stiff solutions
Open Source Code No The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets No We choose N = 400 randomly sampled points to compute the loss function, a batch size of 40 for a small learning rate α = 0.001, and a full batch size for a large learning rate α = 0.5. and We choose Nb = 400 randomly sampled points on Ω, and Nf = 4, 000 randomly sampled points in Ω to compute the loss function. The paper generates data points for training, rather than using an external public dataset.
Dataset Splits No We choose N = 400 randomly sampled points to compute the loss function, a batch size of 40 for a small learning rate α = 0.001, and a full batch size for a large learning rate α = 0.5. The paper describes sampling points for loss computation (training) but does not specify a separate validation dataset split.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions optimizers like Adam and L-BFGS, but does not provide specific version numbers for any software, libraries, or programming languages used.
Experiment Setup Yes The hyperparameters used in the three optimizers are listed in Table 1. We note #Iterations = (K0 K1 + K2) batchs, where K0, K1, K2 are hyper-parameters in Algorithm 1. A neural network with 4 hidden layers, every 50 units with tanh activations, is applied in all the computations.