Losses over Labels: Weakly Supervised Learning via Direct Loss Construction

Authors: Dylan Sam, J. Zico Kolter

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that Lo L improves upon existing weak supervision methods on several benchmark text and image classification tasks and further demonstrate that incorporating gradient information leads to better performance on almost every task.
Researcher Affiliation Collaboration Dylan Sam1, J. Zico Kolter1,2 1 Machine Learning Department, Carnegie Mellon University 2 Bosch Center for Artificial Intelligence dylansam@andrew.cmu.edu, zkolter@cs.cmu.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Code for our experiments can be found here1. 1https://github.com/dsam99/Lo L
Open Datasets Yes We compare Lo L to existing weakly supervised algorithms on 5 text classification datasets from WRENCH (Zhang et al. 2021)... We extend our setting to consider 3 image classification tasks from the Animals with Attributes 2 dataset (Xian et al. 2018).
Dataset Splits Yes For each task, we split the dataset into 80% train and validation data and 20% test data. Then we further split training and validation data into N examples per class of labeled validation data. We report results for validation set sizes of N {10, 15, 20, 50, 100}.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for the experiments. It only mentions general concepts like neural networks.
Software Dependencies No The paper does not specify versions for any software dependencies (e.g., Python, PyTorch, TensorFlow, scikit-learn). It only implicitly refers to programming environments.
Experiment Setup Yes At a high level, this loss function incorporates a squared penalty for the gradient of our model being less than c times the gradient of the heuristic (along non-abstained dimensions). α serves as a hyperparameter that determines the weighting or importance of the gradient matching term, similar to a weighting parameter for regularization. ... and c > 0. ... In these experiments, we train methods for 10 epochs.