Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Being Properly Improper
Authors: Tyler Sypherd, Richard Nock, Lalitha Sankar
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study the twist-proper α-loss under a novel boosting algorithm, called PILBOOST, and provide formal and experimental results for this algorithm. Our overarching practical conclusion is that the twistproper α-loss outperforms the proper log-loss on several variants of twisted data. In Section 6, we implement PILBOOST with the approximate inverse canonical link of α-loss on several tabular datasets, each suffering from various twists (label, feature, and adversarial noise), and compare against Ada Boost (Freund & Schapire, 1997) and XGBoost (Chen & Guestrin, 2016). |
| Researcher Affiliation | Collaboration | 1School of Electrical, Computer and Energy Engineering, Arizona State University; 2Google Research. Correspondence to: Tyler Sypherd <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 PILBOOST |
| Open Source Code | Yes | The code for all of our experiments (including the implementation of PILBOOST) can be found at the following github repository link: https://github.com/Sankar Lab/Being-Properly-Improper |
| Open Datasets | Yes | We provide experimental results on PILBOOST (for α {1.1, 2, 4}) and compare with Ada Boost (Freund & Schapire, 1997) and XGBoost (Chen & Guestrin, 2016) on four binary classification datasets, namely, cancer (Wolberg et al., 1995), xd6 (Buntine & Niblett, 1992), diabetes (Smith et al., 1988), and online shoppers intention (Sakar et al., 2019). |
| Dataset Splits | No | The paper mentions 'train/test split' and 'cross-validation' but does not explicitly describe a separate 'validation' dataset split for hyperparameter tuning. |
| Hardware Specification | Yes | Most of the experiments were performed over the course of a month on a 2015 Mac Book Pro with a 2.2 GHz Quad-Core Intel Core i7 processor and 16GB of memory. The Adaptive α experiments were performed on a computing cluster and each required about 30 minutes of compute time. |
| Software Dependencies | No | The paper mentions using decision trees and XGBoost but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | All algorithms across all experiments ran for 1000 iterations. For α = 1.1, 2, and 4, we set af = 7, 2, and 4, respectively. Hyperparameters of XGBoost were kept to default to maintain the fairest comparison between the three algorithms; for more of these experimental details, please refer to Appendix B.5. All experiments use regression decision trees (of varying depths 1-3) in order to align with XGBoost. |