Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Sampling-based Multi-dimensional Recalibration
Authors: Youngseog Chung, Ian Char, Jeff Schneider
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the performance of our method and the quality of the recalibrated samples on a suite of benchmark datasets in multidimensional regression, a real-world dataset in modeling plasma dynamics during nuclear fusion reactions, and on a decision-making application in forecasting demand. |
| Researcher Affiliation | Academia | 1Machine Learning Department; 2Robotics Institute, Carnegie Mellon University, Pittsburgh, PA 15213. |
| Pseudocode | Yes | Algorithm 1 HDR Recalibration: Training |
| Open Source Code | Yes | Code is available at: https://github.com/YoungseogChung/multi-dimensional-recalibration |
| Open Datasets | Yes | Datasets. The mulan benchmark (Tsoumakas et al., 2011) is a set of prediction tasks with multi-dimensional targets of up to 16 dimensions. |
| Dataset Splits | Yes | On each dataset, we make train-validation-test splits of proportions [65%, 20%, 15%] |
| Hardware Specification | Yes | All of the model training was done with 4 NVIDIA Ge Force RTX 2080 Ti GPUs. All of the evaluation was done on a CPU machine with Intel(R) Xeon(R) Gold 6238 CPU @ 2.10GHz. |
| Software Dependencies | No | The paper mentions software like 'Uncertainty Toolbox' and 'NGBoost' but does not specify version numbers for these or other key software dependencies required for replication. |
| Experiment Setup | Yes | For all of the datasets, the PNN trained has 5 fully connected layers, each with 200 hidden units, and the output parametrizes a diagonal Gaussian with a mean and a log-variance prediction. The Gaussian likelihood loss was used for training, with a learning rate of 0.001 and no weight decay was used. Training was halted early if the validation loss did not improve for more than 100 epochs, for a maximum of 1000 epochs. |