Choose a Transformer: Fourier or Galerkin

Authors: Shuhao Cao

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we present three operator learning experiments, including the viscid Burgers equation, an interface Darcy flow, and an inverse interface coefficient identification problem. The newly proposed simple attention-based operator learner, Galerkin Transformer, shows significant improvements in both training cost and evaluation accuracy over its softmax-normalized counterparts.
Researcher Affiliation Academia Shuhao Cao Department of Mathematics and Statistics Washington University in St. Louis s.cao@wustl.edu
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes The Py Torch codes to reproduce our results are available as an open-source software. 1 https://github.com/scaomath/galerkin-transformer
Open Datasets Yes The data are obtained courtesy of the PDE benchmark under the MIT license.3 https://github.com/zongyi-li/fourier_neural_operator
Dataset Splits No The data is split 80%/20% for training/evaluation for all three examples. While a train/test split is mentioned, a separate validation split is not explicitly specified.
Hardware Specification Yes The training and evaluation is done on a single GPU with 32GB of memory. Specifically, the reported benchmarks in Table 1 use an NVIDIA A100 GPU.
Software Dependencies No The paper mentions several software libraries like PyTorch, NumPy, and SciPy in the acknowledgments, but does not provide specific version numbers for them as dependencies.
Experiment Setup Yes All attention-based models match the parameter quota of the baseline, and are trained using the loss in (2) with the same 1cycle scheduler [78] for 100 epochs.