Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Distribution calibration for regression
Authors: Hao Song, Tom Diethe, Meelis Kull, Peter Flach
ICML 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed method is experimentally veri๏ฌed on a set of common regression models and shows improvements for both distributionlevel and quantile-level calibration. (Abstract) |
| Researcher Affiliation | Collaboration | 1University of Bristol, Bristol, United Kingdom 2Amazon Research, Cambridge, United Kingdom 3University of Tartu, Tartu, Estonia 4The Alan Turing Institute, London, United Kingdom. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures). |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. |
| Open Datasets | Yes | The experiments are applied on the following UCI datasets (sizes in parentheses): 1. Diabetes (442), 2. Boston (506), 3. Airfoil (1503), 4. Forest Fire (517), 5. Strength (1030), 6. Energy (19735). (Section 5) |
| Dataset Splits | No | All the experiments use a random (0.75, 0.25) train-test split, with both the base model and calibrators trained on the same set. (Section 5) |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'ADAM optimiser' but does not specify software details with version numbers (e.g., library or solver names with version numbers like Python 3.8, PyTorch 1.9). |
| Experiment Setup | Yes | The model is trained using the ADAM optimiser with a learning rate of 0.01. (Section 5, Synthetic data) [...] GP-BETA with 8, 16, 32 and 64 inducing points, batch size of 128, and 64 Monte-Carlo samples per batch to compute the objective function and the gradient. The parameters are again optimised using ADAM with a learning rate of 0.001. For the NNs, we use the same setting as in [14], which is a 2-layer fully-connected structure with 128 hidden units per layer and Re LU activation. The dropout rate is set to 0.5, default weight decay of 10 4 and the length scale of 1.0 are used to approximate the mean and variance following the results given in [5]. (Section 5, Real world datasets) |