Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

VAE-Var: Variational Autoencoder-Enhanced Variational Methods for Data Assimilation in Meteorology

Authors: Yi Xiao, Qilong Jia, Kun Chen, LEI BAI, Wei Xue

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on the Feng Wu weather forecasting model, VAE-Var outperforms Diff DA and two traditional algorithms (interpolation and 3DVar) in terms of assimilation accuracy in sparse observational contexts, and is capable of assimilating real-world GDAS prepbufr observations over a year. ... Figures 3 and 4 present the RMSE (root mean square errors) of the analysis states at various time steps to assess assimilation accuracy.
Researcher Affiliation	Collaboration	1Department of Computer Science and Technology, Tsinghua University 2Shanghai Artificial Intelligence Laboratory EMAIL EMAIL EMAIL EMAIL
Pseudocode	Yes	Algorithm 1 Training Set Construction Algorithm 2 VAE-Var Assimilation Algorithm 3 Cyclic Forecasting and Assimilation with VAE-Var Algorithm 4 Logarithmic Interpolation Matrix Construction Algorithm 5 Observation Operator Implementation in Py Torch
Open Source Code	Yes	The code of VAE-Var is available at https://github.com/xiaoyi018/VAE-Var.
Open Datasets	Yes	We sincerely acknowledge the European Centre for Medium-Range Weather Forecasts (ECMWF) for providing the ERA5 dataset and the National Center for Environmental Prediction (NCEP) for providing the GDAS dataset, which are instrumental in this study. Their efforts in data collection, archiving, and dissemination are greatly appreciated. Additionally, all the datasets we use, including ERA5 and GDAS prepbufr, are available online.
Dataset Splits	Yes	The six-hour forecasting model is trained using the ERA5 dataset from 1979 to 2015. ... We use ERA5 reanalysis data from 1979 to 2015 to train the VAE model... The system is simulated in an autoregressive manner for 15 days, starting from January 1, 2022. ... We select the year 2017 for conducting the assimilation experiment because the completeness of the observational data is highest for that year.
Hardware Specification	Yes	For example, on a single A100 GPU, one cycle of assimilation takes approximately 18 seconds.
Software Dependencies	No	The paper mentions 'Py Torch' and 'torch-harmonic library', but does not specify version numbers for these or other software dependencies.
Experiment Setup	Yes	The loss function consists of two components: the reconstruction loss (Lossrec) and the Kullback-Leibler (KL) divergence (Loss KL). ... a hyperparameter σ is introduced to balance the reconstruction loss and the KL divergence, and the total loss is expressed as Loss = 1 σ2 Lossrec + Loss KL. ... the loss weight σ is set to 2.0. ... The observation covariance matrix R is assumed to be diagonal, with the square root of each entry set to 0.1 times the standard deviation of the respective variable. The parameter λ is set to 4.0. ... the assimilation cycle T equals six hours