ClimaX: A foundation model for weather and climate
Authors: Tung Nguyen, Johannes Brandstetter, Ashish Kapoor, Jayesh K Gupta, Aditya Grover
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Compared to existing data-driven baselines, we show that this generality in Clima X results in superior performance on benchmarks for weather forecasting and climate projections, even when pretrained at lower resolutions and compute budgets. We finetune Clima X on a diverse set of downstream tasks to evaluate its performance and generality. |
| Researcher Affiliation | Collaboration | Tung Nguyen 1 2 Johannes Brandstetter 3 Ashish Kapoor 1 Jayesh K. Gupta * 1 Aditya Grover * 1 2 1Microsoft Autonomous Systems and Robotics Research 2UCLA 3Microsoft Research AI4Science. |
| Pseudocode | No | The paper describes the model architecture and training process in text and diagrams (Figures 2, 3, 8, 9), but no structured pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | Our source code is available at https: //github.com/microsoft/Clima X. |
| Open Datasets | Yes | We develop and demonstrate Clima X, a flexible and generalizable deep learning model for weather and climate science that can be trained using heterogeneous datasets spanning different variables, spatio-temporal coverage, and physical groundings. We pretrain Clima X on CMIP6 data to predict future weather conditions given the current conditions. For various weather related downstream tasks, we use the ERA5 reanalysis data as described in Appendix E.2. The Coupled Model Intercomparison Project (CMIP) (Meehl et al., 2000) is an international effort across different individual climate modeling groups. The ERA5 reanalysis archive (Hersbach et al., 2018; 2020) of the European Center for Medium-Range Weather Forecasting (ECMWF) is the predominant data source for learning and benchmarking weather forecasting systems. |
| Dataset Splits | Yes | Following (Rasp et al., 2020), we split the data into three sets, in which the training data is from 1979 to 2015, the validation data is in 2016, and the test data is in 2017 and 2018. |
| Hardware Specification | Yes | Notably, our benchmark results are state-of-the-art on Climate Bench (Watson-Parris et al., 2022) and competitive with the operational Integrated Forecasting System (IFS) (Wedi et al., 2015) on Weather Bench (Rasp et al., 2020), even when our model is trained on moderate resolutions using only a maximum of 80 NVIDIA V100 GPUs. Model inference works with a single NVIDIA V100 GPU. We used 32GB NVIDIA V100 devices for training. For pretraining we distribute the batch across 80 V100s on Azure ML. |
| Software Dependencies | No | We use Py Torch (Paszke et al., 2019), timm (Wightman, 2019), numpy (Harris et al., 2020) and xarray (Hoyer & Hamman, 2017) to manage our data and model training. This mentions the software packages but not their specific version numbers. |
| Experiment Setup | Yes | We used the Adam W optimizer (Kingma & Ba, 2014; Loshchilov & Hutter, 2017) with parameters (β1 = 0.9, β2 = 0.95). We used weight decay of 1e 5 for all parameters except for the positional embedding. We used a learning rate of 5e 4, with a linear warmup schedule for 10000 steps (5 epochs), followed by a cosine-annealing schedule for 190000 steps (95 epochs). Table 4: D Embedding dimension 1024, Depth Number of Vi T blocks 8, # heads Number of attention heads 16. |