Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Masked Autoencoders are PDE Learners
Authors: Anthony Zhou, Amir Barati Farimani
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | As such, to study masked modeling within the PDE domain, we train masked autoencoders on a diverse set of 1D and 2D PDE data and evaluate their learned representations. We demonstrate that self-supervised masked pretraining can learn latent structure that can express different coefficients, discretizations, boundary conditions or PDEs under a common representation. Furthermore, we show that masked autoencoders (MAEs) can learn highly structured latent spaces through masking alone. MAE models can be used to improve downstream tasks such as predicting PDE features or guiding neural solvers in time-stepping or super-resolution through providing meaningful context. Our contributions suggest the possibility to transfer the scalability and flexibility of masked modeling from language and vision domains to physics creating rich, unified representations of diverse physics through self-supervised learning. We provide the code and datasets used in this study here: https://github.com/anthonyzhou-1/mae-pdes. (Introduction) |
| Researcher Affiliation | Academia | Anthony Zhou EMAIL Department of Mechanical Engineering Carnegie Mellon University Amir Barati Farimani EMAIL Department of Mechanical Engineering Department of Machine Learning Carnegie Mellon University |
| Pseudocode | No | The paper describes methods in prose and uses figures (Figure 1, Figure 8) to illustrate architectures and pipelines, but it does not include any explicitly labeled pseudocode or algorithm blocks with structured steps. |
| Open Source Code | Yes | We provide the code and datasets used in this study here: https://github.com/anthonyzhou-1/mae-pdes . |
| Open Datasets | Yes | We provide the code and datasets used in this study here: https://github.com/anthonyzhou-1/mae-pdes . (Introduction) We describe a variety of PDEs used for masked pretraining and downstream evaluation. In 1D, we pretrain MAE models on the Kd V-Burgers equation only, while in 2D we pretrain on the Heat, Advection, and Burgers equations simultaneously. In all PDEs, coefficients and forcing terms are randomly sampled to produce diverse dynamics within a dataset. (Section 4.1 PDEs and Datasets) |
| Dataset Splits | Yes | MAEb 3.454 0.131 0.834 0.041 0.241 0.051 0.354 0.104 MAEf 1.334 0.036 0.677 0.016 0.551 0.03 0.368 0.02 MAE 0.905 0.059 0.505 0.065 0.244 0.064 0.156 0.023 (Table 10) Models are fine-tuned on 2000 held-out, labeled samples for each task. (Table 3 caption) Models are fine-tuned on 1024 held-out, labeled samples for each task, or 3072 samples in the combined case. (Table 4 caption) In 1D, time-stepping results tend to have high variance; however, overall trends are still consistent with those reported in the main body. The variance is likely attributed to variations in the dataset for each seed; each seed samples a different set of 2000 samples from a larger PDE dataset, and as a result, some data splits may be easier than others. (Appendix E.2 Time-stepping) |
| Hardware Specification | Yes | In 1D, the MAE is trained on a single NVIDIA GeForce RTX 2080 Ti, and reaches convergence in about 6 hours. In 2D, the MAE is trained on a single NVIDIA RTX A6000, and reaches convergence in about 24 hours. |
| Software Dependencies | No | The paper mentions the use of a Vi T architecture and FEniCS for solving equations, but it does not specify version numbers for any software libraries, frameworks, or environments used in the experiments. |
| Experiment Setup | Yes | We present hyperparameters in Table 7. (Appendix A) Table 7: MAE Hyperparameters during pretraining. (a) 1D PDEs Parameters Value Batch Size 256 Epochs 20 Encoder Dim 256 Decoder Dim 32 Patch Size (5, 5) Masking Ratio 0.75 Time Window 20 Augmentation Ratio 0.5 Base LR 1e-3 Optimizer Adam W Scheduler One Cycle LR (Appendix A) Table 9: Hyperparameters for architectures used for time-stepping and super-resolution. (Appendix C) |