Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

On Tilted Losses in Machine Learning: Theory and Applications

Authors: Tian Li, Ahmad Beirami, Maziar Sanjabi, Virginia Smith

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we demonstrate that TERM can be used for a multitude of applications in machine learning, such as enforcing fairness between subgroups, mitigating the eﬀect of outliers, and handling class imbalance. Despite the straightforward modiﬁcation TERM makes to traditional ERM objectives, we ﬁnd that the framework can consistently outperform ERM and deliver competitive performance with state-of-the-art, problem-speciﬁc approaches.
Researcher Affiliation	Collaboration	Tian Li EMAIL Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213, USA Ahmad Beirami : EMAIL Google Research New York, NY 10011, USA Maziar Sanjabi EMAIL Meta AI Menlo Park, CA 94025, USA Virginia Smith EMAIL Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213, USA
Pseudocode	Yes	Algorithm 1: Batch (Non-Hierarchical) TERM Algorithm 2: Stochastic (Non-Hierarchical) TERM Algorithm 3: Batch Hierarchical TERM Algorithm 4: Stochastic Hierarchical TERM
Open Source Code	Yes	All code, datasets, and experiments are publicly available at github.com/litian96/TERM.
Open Datasets	Yes	Drug Discovery dataset (Olier et al., 2018; Diakonikolas et al., 2019). CIFAR10 (Krizhevsky et al., 2009). cal-housing (Pace and Barry, 1997) and abalone (Dua and Graﬀ, 2019). MNIST (Le Cun et al., 1998). HIV-1 dataset (Rögnvaldsson, 2013; Dua and Graﬀ, 2019).
Dataset Splits	Yes	We randomly split the dataset into 80% training set, 10% validation set, and 10% testing set. standard CIFAR-10 data and their standard train/val/test partitions. unbalanced data extracted from MNIST (Le Cun et al., 1998) used in Ren et al. (2018).
Hardware Specification	No	The paper discusses scaling for large-scale problems and training deep neural networks, which implies computational resources, but it does not specify any particular GPU or CPU models, memory sizes, or other specific hardware configurations used for experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., "Python 3.8", "PyTorch 1.9", "TensorFlow 2.x"). It only mentions methods and publicly available code.
Experiment Setup	Yes	Selecting t. In Section 7.2 where we consider positive t s, we select t from a limited candidate set of t0.1,1,2,5,10,50,100,200u on the held-out validation set. ... For all experiments, we tune all other hyperparameters (the learning rates, the regularization parameters, the decision threshold for ERM , ρ for (Duchi and Namkoong, 2019), the quantile value for CVa R (i.e., α in Eq. (62)) (Rockafellar et al., 2000), α and γ for focal loss (Lin et al., 2017)) based on a validation set, and select the best one. ... The initial step-size is set to 0.1 and decayed to 0.01 at epoch 50. The batch size is 100.