reproducibilityindex.ai

Predicting Rare Events by Shrinking Towards Proportional Odds

Authors: Gregory Faletto, Jacob Bien

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 4 we demonstrate through synthetic and real data experiments that PRESTO can outperform both logistic regression on the rare class and the proportional odds model, both in settings where the differences in adjacent βk vectors are sparse, as PRESTO assumes, and in settings where these differences are not sparse. 4. Experiments To illustrate the efﬁcacy of PRESTO, we conduct two synthetic experiments and also examine two real data sets.
Researcher Affiliation	Academia	Gregory Faletto 1 Jacob Bien 1 1 Department of Data Sciences and Operations, University of Southern California, Los Angeles, CA, USA.
Pseudocode	No	The paper describes mathematical formulations and discusses implementation details, referring to modifications of existing packages, but it does not include a structured pseudocode block or algorithm listing.
Open Source Code	Yes	The code generating all plots and tables is available at https://github.com/gregfaletto/presto.
Open Datasets	Yes	We conduct a real data experiment using the soup data set from the R ordinal package (R. H. B. Christensen, 2019). ... We present another real data experiment using the data set Pre Diabetes from the R MLData R package (Hutson et al., 2022).
Dataset Splits	Yes	For PRESTO, we use 5-fold cross-validation to choose a value of λn among 20 choices, selecting the λn with the best out-of-fold Brier score (other metrics, like negative log likelihood, failed because some values of λn in some folds resulted in models yielding negative probabilities, so these other metrics were undeﬁned). ... First, we randomly split the data into training (90% of the data) and test (10%) sets.
Hardware Specification	Yes	The real data experiments from Section 4.3 and Appendix B were conducted in R Version 4.3.0 running on mac OS Ventura 13.3.1 on a Mac Book Pro with a 2.3 GHz Quad-Core Intel Core i5 processor and 16 GB or RAM. ... The synthetic data experiments from Sections 4.1 and 4.2, as well as Simulation Studies A and B in Appendix E, were conducted in R Version 4.2.2 running on mac OS 10.15.7 on an i Mac with a 3.5 GHz Quad-Core Intel Core i7 processor and 32 GB or RAM.
Software Dependencies	Yes	We used the R packages MASS (Venables & Ripley, 2002, version 7.3.58.1), simulator (Bien, 2016, version 0.2.4), ggplot2 (Wickham, 2016, version 3.3.6), cowplot (Wilke, 2020, version 1.1.1), and stargazer (Hlavac, 2022, version 5.2.3), all available for download on CRAN, as well as the base parallel package (version 4.3.0).
Experiment Setup	Yes	We repeat the following procedure for 700 simulations. First we generate data using n = 2500, p = 10, and K = 4. We draw a random X [ 1, 1]n p, where Xij Uniform( 1, 1) for all i {1, . . . , n} and j {1, . . . , p}. Then y {1, . . . , K}n is generated according to a relaxation of the proportional odds model; instead of (1), we generate probabilities according to (3) where the βk are generated in the following way for sparsity settings of η {1/3, 1/2}: ... We consider three possible sets of intercepts: α = (0, 3, 5), (0, 3.5, 5.5), and (0, 4, 6)... For PRESTO, we use 5-fold cross-validation to choose a value of λn among 20 choices, selecting the λn with the best out-of-fold Brier score...