Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

HR-Extreme: A High-Resolution Dataset for Extreme Weather Forecasting

Authors: Nian Ran, Peng Xiao, Yue Wang, Wesley Shi, Jianxin Lin, Qi Meng, Richard Allmendinger

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also evaluate the current state-of-the-art deep learning models and Numerical Weather Prediction (NWP) systems on HR-Extreme, and provide a improved baseline deep learning model called HR-Heim which has superior performance on both general loss and HR-Extreme compared to others. Our results reveal that the errors of extreme weather cases are significantly larger than overall forecast error, highlighting them as an crucial source of loss in weather prediction.
Researcher Affiliation	Collaboration	Nian Ran1, Peng Xiao2, Yue Wang1 , Wenlei Shi3, Jiaxin Lin2, Qi Meng4, Richard Allmendinger5 1Zhongguancun Academy 2Hunan University 3Microsoft Research 4Chinese Academy of Sciences 5University of Manchester
Pseudocode	No	The paper describes methods and model architectures (e.g., HR-Heim architecture, data collection processes) in narrative text, but it does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	No	Our dataset (https://huggingface.co/datasets/Nian Ran1/HR-Extreme) and code (https: //github.com/Husky Nian/HR-Extreme) will be available upon the accpetance of this paper.
Open Datasets	Yes	Our dataset focuses on extreme weather events in the U.S. based on NOAA HRRR data. The HRRR data is in U.S. Government Work license, which means that the data is in the public domain and can be freely used, distributed, and modified without any restrictions. Our dataset (https://huggingface.co/datasets/Nian Ran1/HR-Extreme) and code (https: //github.com/Husky Nian/HR-Extreme) will be available upon the accpetance of this paper.
Dataset Splits	Yes	All models (Pangu, Fuxi, and our HR-Heim) were trained on HRRR data spanning the U.S. from January 2019 to June 2020, from scratch. ... We first evaluated the NWP model and the deep learning model on the original test set spanning from July 2020 to the end of 2020. Subsequently, these models were assessed and compared on HR-Extreme during the same period, as illustrated in Table 2.
Hardware Specification	Yes	All models were evaluated on an Nvidia A100 80G GPU, with the evaluation of a deep learning model for half a year s data taking approximately 8 hours.
Software Dependencies	No	The paper mentions several deep learning models and libraries by name (e.g., Pangu, Fuxi, Swin Transformer, Herbie Python library), but does not provide specific version numbers for the software dependencies used in their experiments.
Experiment Setup	Yes	All models (Pangu, Fuxi, and our HR-Heim) were trained on HRRR data spanning the U.S. from January 2019 to June 2020, from scratch. They were trained under identical parameters and same level of model parameters, and no hyperparameter tuning was applied to HR-Heim. For our experiments, we set the batch size to 8.