Online Platt Scaling with Calibeating

Authors: Chirag Gupta, Aaditya Ramdas

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, it is effective on a range of synthetic and real-world datasets, with and without distribution drifts, achieving superior performance without hyperparameter tuning.
Researcher Affiliation Academia 1Carnegie Mellon University, Pittsburgh PA, USA.
Pseudocode Yes Algorithm 1 in the Appendix contains pseudocode for our final OPS implementation.
Open Source Code Yes Code to reproduce the experiments can be found at https://github.com/aigen/ df-posthoc-calibration (see Appendix A.4 for more details).
Open Datasets Yes We worked with four public datasets in two settings. Links to the datasets are in Appendix A.1. ... Table 2: Metadata for datasets used in Section 4.1.
Dataset Splits No The paper describes training data, and a 'test-stream' which is also used for 'recalibration' (calibration data for online learning), but it does not explicitly define a separate 'validation' dataset split for purposes like hyperparameter tuning or early stopping.
Hardware Specification Yes For computation, we used allocation CIS220171 from the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program, supported by NSF grants 2138259, 2138286, 2138307, 2137603, and 2138296. Specifically, we used the Bridges2 system (Towns et al., 2014), supported by NSF grant 1928147, at the Pittsburgh Supercomputing Center (PSC).
Software Dependencies No the base model f was a random forest (sklearn s implementation).
Experiment Setup Yes Thus we used ONS for experiments based on a verbatim implementation of Algorithm 12 in Hazan (2016), with γ 0.1, ρ 100, and K tpa, bq : }pa, bq}2 ď 100u. ... All default parameters were used, except n estimators was set to 1000. No hyperparameter tuning on individual datasets was performed for any of the recalibration methods.