Choose a Transformer: Fourier or Galerkin
Authors: Shuhao Cao
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we present three operator learning experiments, including the viscid Burgers equation, an interface Darcy flow, and an inverse interface coefficient identification problem. The newly proposed simple attention-based operator learner, Galerkin Transformer, shows significant improvements in both training cost and evaluation accuracy over its softmax-normalized counterparts. |
| Researcher Affiliation | Academia | Shuhao Cao Department of Mathematics and Statistics Washington University in St. Louis s.cao@wustl.edu |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | The Py Torch codes to reproduce our results are available as an open-source software. 1 https://github.com/scaomath/galerkin-transformer |
| Open Datasets | Yes | The data are obtained courtesy of the PDE benchmark under the MIT license.3 https://github.com/zongyi-li/fourier_neural_operator |
| Dataset Splits | No | The data is split 80%/20% for training/evaluation for all three examples. While a train/test split is mentioned, a separate validation split is not explicitly specified. |
| Hardware Specification | Yes | The training and evaluation is done on a single GPU with 32GB of memory. Specifically, the reported benchmarks in Table 1 use an NVIDIA A100 GPU. |
| Software Dependencies | No | The paper mentions several software libraries like PyTorch, NumPy, and SciPy in the acknowledgments, but does not provide specific version numbers for them as dependencies. |
| Experiment Setup | Yes | All attention-based models match the parameter quota of the baseline, and are trained using the loss in (2) with the same 1cycle scheduler [78] for 100 epochs. |