Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

WeatherGFM: Learning a Weather Generalist Foundation Model via In-context Learning

Authors: Xiangyu Zhao, Zhiwang Zhou, Wenlong Zhang, Yihao Liu, Xiangyu Chen, Junchao Gong, Hao Chen, Ben Fei, Shiqi Chen, Wanli Ouyang, Xiao-Ming Wu, LEI BAI

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments indicate that our Weather GFM can effectively handle up to 12 weather understanding tasks, including weather forecasting, super-resolution, weather image translation, and post-processing. Our method also showcases generalization ability on unseen tasks. The source code is available at https://github.com/xiangyu-mm/Weather GFM. --- 4 EXPERIMENTS --- Table 2: Quantitative results on weather understanding tasks. #: single-task model. : trained with all weather understanding tasks. : continual training with REA5 dataset. RMSE, ACC and CSI are calculated as the quantitative metric. A lower RMSE and higher CSI/ACC indicate better results.
Researcher Affiliation	Academia	Xiangyu Zhao2, 1, , Zhiwang Zhou1, Wenlong Zhang1, , Yihao Liu1, Xiangyu Chen1, Junchao Gong1, 3, Hao Chen1, Ben Fei1, Shiqi Chen4, Wanli Ouyang1, Xiao-Ming Wu2, , Lei Bai1 1Shanghai AI Laboratory 2The Hong Kong Polytechnic University 3Shanghai Jiao Tong University 4Shanghai Meteorological Service EMAIL, EMAIL
Pseudocode	No	The paper describes the model architecture and methodology in detail using figures and textual descriptions (e.g., equations 1-6) but does not provide a distinct, structured pseudocode block or algorithm section.
Open Source Code	Yes	Extensive experiments indicate that our Weather GFM can effectively handle up to 12 weather understanding tasks, including weather forecasting, super-resolution, weather image translation, and post-processing. Our method also showcases generalization ability on unseen tasks. The source code is available at https://github.com/xiangyu-mm/Weather GFM.
Open Datasets	Yes	Specifically, we leverage the Storm EVent Image Ry (SEVIR) (Veillette et al., 2020), ERA5 (Hersbach et al., 2020), POMINO-TROPOMI product (Liu et al., 2020) and GEOS-CF (Keller et al., 2021) datasets to train and evaluate our Weather GFM. We provide a detailed introduction in Appendix A and B.
Dataset Splits	Yes	Ultimately, the dataset we utilize comprises 11,508 events with four distinct sensing modalities. Among them, 11,308 events are selected as the training set, while 100 events are designated as the validation set and 100 events are designated as the test set. Consequently, the training set contains a total of 2.2M images, while the validate/test set has a total of 19.6K images. ... After processing, each modality has 20,000 images with a resolution of 256 256. Among them, we allocate 18,000 images as the training set, 1,000 images as the validation set, and 1,000 images as the test set. ... the SEVIR data was extracted and processed to generate 135,696 sequences for training, along with an independent set of 1,200 sequences to validate/test the fitted model. ... For each spatial SR for satellite tasks, the SEVIR data was extracted and processed to yield 542,784 images for training, along with an independent set of 4,800 images for validating/testing. ... For the temporal SR task, the SEVIR data was further extracted and processed to generate 407,088 sequences for training, along with an independent set of 3,600 sequences to validate/test the fitted model.
Hardware Specification	Yes	We use 16 Nvidia A100 GPUs for training.
Software Dependencies	No	The paper mentions using Adam W optimizer, L1 loss, and references a ViT implementation from (Beyer et al., 2022) and UNet networks. However, it does not specify version numbers for any libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages (e.g., Python) used for implementation.
Experiment Setup	Yes	Training details. During training, we resize the weather images of different resolutions to a resolution of 256 256 and input them into the model in accordance with the combination mode of Pin, Pout, Xin, Xout in the task-specific prompt format, resulting in a N 256 256 total input resolution. The L1 loss is employed as the loss function. For optimization, the Adam W optimizer with a cosine learning rate scheduler is utilized. The base learning rate is 1e4. The batch size is 20. ... A total of 50 epochs are executed. We leverage fp16 floating point precision in our model. --- Table 5: Default hyperparameters of Weather GFM --- Table 6: Hyperparameters of Vi T --- Table 7: Hyperparameters of UNet