Precipitation Downscaling with Spatiotemporal Video Diffusion

Authors: Prakhar Srivastava, Ruihan Yang, Gavin Kerrigan, Gideon Dresdner, Jeremy McGibbon, Christopher S. Bretherton, Stephan Mandt

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test our approach on FV3GFS output, an established large-scale global atmosphere model, and compare it against six state-of-the-art baselines. Our analysis, capturing CRPS, MSE, precipitation distributions, and qualitative aspects using California and the Himalayas as examples, establishes our method as a new standard for data-driven precipitation downscaling.
Researcher Affiliation Collaboration Prakhar Srivastava1 Ruihan Yang1 Gavin Kerrigan1 Gideon Dresdner2 Jeremy Mc Gibbon2 Christopher Bretherton2 Stephan Mandt1 1University of California, Irvine 2Allen Institute for AI, Seattle
Pseudocode Yes Algorithm 1: Training STVD and Algorithm 2: Sampling STVD
Open Source Code Yes Code : https://github.com/mandt-lab/STVD. The code for our model is available at https://github.com/mandt-lab/STVD.
Open Datasets Yes Our dataset derives from an 11-member initial condition ensemble of 13-month simulations using a global atmosphere model, FV3GFS, run at 25 km resolution and forced by climatological sea surface temperatures and sea ice. The first month of each simulation is discarded to allow the simulations to spin up and meteorologically diverge, effectively providing 11 years of reference data (of which first 10 years are used for training and the last year for validation). FV3GFS, developed by the National Oceanic and Atmospheric Administration (NOAA), is a version of NOAA s operational global weather forecast model ([77, 10]).
Dataset Splits Yes effectively providing 11 years of reference data (of which first 10 years are used for training and the last year for validation).
Hardware Specification Yes We optimize our model end-to-end with a single diffusion loss using Adam [28] with an initial learning rate of 1 10 4, decaying to 5 10 7 with cosine annealing during training, executed on an NVidia RTX A6000 GPU.
Software Dependencies No The paper mentions software components like 'Adam' and 'DDIM sampling' and refers to v-parametrization, but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, or other libraries).
Experiment Setup Yes Our approach (STVD) trains on 5 consecutive frames that are downscaled jointly. We optimize our model end-to-end with a single diffusion loss using Adam [28] with an initial learning rate of 1 10 4, decaying to 5 10 7 with cosine annealing during training, executed on an NVidia RTX A6000 GPU. The diffusion model is trained using v-parametrization [59], with a fixed diffusion depth (N = 1400). Random tiles extracted from the cube-sphere representation of Earth, with dimensions 384 in high-resolution and 48 in low-resolution, are used during training. We train for one million steps, requiring approximately 7 days on a single node (slightly less for ablations). We use a batch size of one, apply a logarithmic transformation to precipitation states, and normalize to the range [ 1, 1]. During testing, we employ DDIM sampling with 30 steps on an Exponential Moving Average (EMA) variant of our model (for full frame size), with a decay rate of 0.995.