Sub-Seasonal Climate Forecasting via Machine Learning: Challenges, Analysis, and Advances
Authors: Sijie He, Xinyan Li, Timothy DelSole, Pradeep Ravikumar, Arindam Banerjee169-177
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we carefully investigate 10 Machine Learning (ML) approaches to sub-seasonal temperature forecasting over the contiguous U.S. on the SSF dataset we collect... Our results indicate that suitable ML models, e.g., XGBoost, to some extent, capture the predictability on sub-seasonal time scales and can outperform the climatological baselines, while Deep Learning (DL) models barely manage to match the best results with carefully designed architecture. |
| Researcher Affiliation | Academia | 1Department of Computer Science & Engineering, University of Minnesota, Twin Cities 2 Department of Atmospheric, Oceanic, and Earth Science, George Mason University 3 Machine Learning Department, Carnegie Mellon University 4 Department of Computer Science, University of Illinois Urbana-Champaign |
| Pseudocode | No | The paper describes different ML models and their architectures (e.g., Figure 2 for DL models) but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The SSF dataset and code 1 are released with this paper for use by the broader research community. 1The SSF dataset and codebase are publicly available at https://sites.google.com/view/ssf-ml/home. |
| Open Datasets | Yes | We construct the SSF dataset by collecting climate variables (Table 1) from a diverse collection of data sources and converting them into a consistent format. In particular, temporal variables, e.g., Ni no indices, are interpolated to a daily resolution, and spatiotemporal variables are interpolated to a spatial resolution of 0.5 by 0.5 . The SSF dataset and code 1 are released with this paper for use by the broader research community. |
| Dataset Splits | Yes | 5-fold training-validation pairs for hyper-parameter tuning, based on a sliding-window strategy designed for time-series data. Each validation set consists of the data from the same month of the year as the test set, and we create 5 such sets from dates in the past 5 years (2012 2016). Their corresponding training sets contain 10 years of data before each validation set; |
| Hardware Specification | No | The paper acknowledges "computing support from the Minnesota Supercomputing Institute (MSI) at the University of Minnesota" but does not specify any particular hardware components like CPU or GPU models, or specific cluster configurations used for experiments. |
| Software Dependencies | No | The paper mentions various models like XGBoost, Lasso, FNN, CNN-LSTM, and uses terms like "Principal Component Analysis (PCA)" and "SHAP (SHapley Additive ex Planations)", but it does not specify any software names with version numbers (e.g., Python, TensorFlow, PyTorch, scikit-learn versions). |
| Experiment Setup | No | The paper describes the data preprocessing (z-scoring, PCA for features), model architectures (e.g., Encoder (LSTM)-Decoder (FNN), CNN-LSTM layers), and the evaluation pipeline (5-fold validation, test set generation), but it does not provide specific numerical hyperparameters such as learning rates, batch sizes, epochs, or optimizer details for the training of the models. |