Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
General Uncertainty Estimation with Delta Variances
Authors: Simon Schmitt, John Shawe-Taylor, Hado van Hasselt
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To empirically study the Delta Variance we build on the stateof-the-art Graph Cast weather forecasting system (Lam et al. 2023)... We assess the Epistemic Variance predictions on 5 years of hold-out data using multiple metrics such as the correlation between predicted variance and prediction error and the likelihood of the quantities of interest. Empirically Delta Variances with a diagonal Fisher approximation yield competitive results at lower computational cost see Figure 3. |
| Researcher Affiliation | Collaboration | 1 Deep Mind 2 University College London, UK |
| Pseudocode | No | The paper describes methods and derivations in paragraph form and mathematical equations. It does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | No | We build on the state-of-the-art Graph Cast weather prediction system...Training data ranges from 1979-2013 with validation data from 2014-2017 and holdout data from 2018-2021 resulting in about 100 GB of weather data. While the paper cites the Graph Cast system (Lam et al. 2023), it does not provide explicit access information (link, DOI, specific repository) for the *specific data* used in their experiments. |
| Dataset Splits | Yes | Training data ranges from 1979-2013 with validation data from 2014-2017 and holdout data from 2018-2021 resulting in about 100 GB of weather data. |
| Hardware Specification | No | The paper does not specify any particular hardware (GPU, CPU, TPU models) used for training or inference. It only mentions, 'To save resources we retrain the model for a grid size of 4 degrees and reduce the number of layers and latents each by factor a of 2.' |
| Software Dependencies | No | The paper mentions the use of 'any auto-differentiation framework' and the 'Graph Cast weather forecasting system (Lam et al. 2023)' but does not provide specific version numbers for any software dependencies or libraries used in their implementation. |
| Experiment Setup | Yes | To save resources we retrain the model for a grid size of 4 degrees and reduce the number of layers and latents each by factor a of 2. Finally we skip the fine-tuning curriculum for simplicity. In our experiments we optimize the coefficients of this linear combination using gradient descent to improve the loglikelihood or correlation on a small set of held-out validation data. |