DiffDA: a Diffusion model for weather-scale Data Assimilation

Authors: Langwen Huang, Lukas Gianinazzi, Yuejiang Yu, Peter Dominik Dueben, Torsten Hoefler

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments based on simulated observations from the ERA5 reanalysis dataset, our method can produce assimilated global atmospheric data consistent with observations at 0.25 ( 30km) resolution globally. This marks the highest resolution achieved by ML data assimilation models. The experiments also show that the initial conditions assimilated from sparse observations (less than 0.96% of gridded data) and 48-hour forecast can be used for forecast models with a loss of lead time of at most 24 hours compared to initial conditions from state-of-the-art data assimilation in ERA5.
Researcher Affiliation Collaboration Langwen Huang 1 Lukas Gianinazzi 1 Yuejiang Yu 1 Peter D. Dueben 2 Torsten Hoefler 1 1Department of Computer Science, ETH Z urich, Switzerland 2European Centre for Medium-Range Weather Forecasts (ECMWF), Reading, United Kingdom.
Pseudocode Yes Algorithm 1 Data assimilation (inference) Input: predicted state ˆx, hard mask of observations mh, observation values at grid points mh x (created from observation vector y, and observation operator A), covariance schedule βj, j = 1, , N, Gaussian kernel standard deviation σG, Gaussian kernel diameter d, scaling factor s for de-normalization Output: x p(x|ˆx, y)
Open Source Code Yes The source code is available in this repository https://github.com/spcl/Diff DA. The repository includes a modified Graph Cast model which has the Apache License version 2.0.
Open Datasets Yes We use the Weather Bench2 dataset as the first part of the training data. The dataset contains values for our target atmospheric variables from 1979 to 2016 with a time interval of 6 hours extracted from the ERA5 reanalysis dataset. The training process uses data from 1979 to 2015. Validation uses data in 2016 and testing uses data in 2022. ... Rasp et al., 2023.
Dataset Splits Yes The training process uses data from 1979 to 2015. Validation uses data in 2016 and testing uses data in 2022.
Hardware Specification Yes We perform data-parallel training on 48 NVIDIA A100 GPUs with a (global) batch size of 48 for 20 epochs. ... All the inference is performed on a single node with one A100 GPU, which produces an assimilated state in around 15 minutes.
Software Dependencies No The diffusion model is implemented with the JAX library (Bradbury et al., 2018), DIFFUSERS library (von Platen et al., 2023), and the official implementation of Graph Cast. Specific version numbers for these libraries are not provided.
Experiment Setup Yes We use the Adam W optimizer (Loshchilov and Hutter, 2018) with a warm-up cosine annealing learning rate schedule that starts from 10 5, peaks at 10 4 after 1/6 of total training steps, and ends at 3 10 6. We perform data-parallel training on 48 NVIDIA A100 GPUs with a (global) batch size of 48 for 20 epochs.