Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
On Tilted Losses in Machine Learning: Theory and Applications
Authors: Tian Li, Ahmad Beirami, Maziar Sanjabi, Virginia Smith
JMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we demonstrate that TERM can be used for a multitude of applications in machine learning, such as enforcing fairness between subgroups, mitigating the effect of outliers, and handling class imbalance. Despite the straightforward modification TERM makes to traditional ERM objectives, we find that the framework can consistently outperform ERM and deliver competitive performance with state-of-the-art, problem-specific approaches. |
| Researcher Affiliation | Collaboration | Tian Li EMAIL Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213, USA Ahmad Beirami : EMAIL Google Research New York, NY 10011, USA Maziar Sanjabi EMAIL Meta AI Menlo Park, CA 94025, USA Virginia Smith EMAIL Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213, USA |
| Pseudocode | Yes | Algorithm 1: Batch (Non-Hierarchical) TERM Algorithm 2: Stochastic (Non-Hierarchical) TERM Algorithm 3: Batch Hierarchical TERM Algorithm 4: Stochastic Hierarchical TERM |
| Open Source Code | Yes | All code, datasets, and experiments are publicly available at github.com/litian96/TERM. |
| Open Datasets | Yes | Drug Discovery dataset (Olier et al., 2018; Diakonikolas et al., 2019). CIFAR10 (Krizhevsky et al., 2009). cal-housing (Pace and Barry, 1997) and abalone (Dua and Graff, 2019). MNIST (Le Cun et al., 1998). HIV-1 dataset (Rögnvaldsson, 2013; Dua and Graff, 2019). |
| Dataset Splits | Yes | We randomly split the dataset into 80% training set, 10% validation set, and 10% testing set. standard CIFAR-10 data and their standard train/val/test partitions. unbalanced data extracted from MNIST (Le Cun et al., 1998) used in Ren et al. (2018). |
| Hardware Specification | No | The paper discusses scaling for large-scale problems and training deep neural networks, which implies computational resources, but it does not specify any particular GPU or CPU models, memory sizes, or other specific hardware configurations used for experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., "Python 3.8", "PyTorch 1.9", "TensorFlow 2.x"). It only mentions methods and publicly available code. |
| Experiment Setup | Yes | Selecting t. In Section 7.2 where we consider positive t s, we select t from a limited candidate set of t0.1,1,2,5,10,50,100,200u on the held-out validation set. ... For all experiments, we tune all other hyperparameters (the learning rates, the regularization parameters, the decision threshold for ERM , ρ for (Duchi and Namkoong, 2019), the quantile value for CVa R (i.e., α in Eq. (62)) (Rockafellar et al., 2000), α and γ for focal loss (Lin et al., 2017)) based on a validation set, and select the best one. ... The initial step-size is set to 0.1 and decayed to 0.01 at epoch 50. The batch size is 100. |