Losses over Labels: Weakly Supervised Learning via Direct Loss Construction
Authors: Dylan Sam, J. Zico Kolter
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that Lo L improves upon existing weak supervision methods on several benchmark text and image classification tasks and further demonstrate that incorporating gradient information leads to better performance on almost every task. |
| Researcher Affiliation | Collaboration | Dylan Sam1, J. Zico Kolter1,2 1 Machine Learning Department, Carnegie Mellon University 2 Bosch Center for Artificial Intelligence dylansam@andrew.cmu.edu, zkolter@cs.cmu.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code for our experiments can be found here1. 1https://github.com/dsam99/Lo L |
| Open Datasets | Yes | We compare Lo L to existing weakly supervised algorithms on 5 text classification datasets from WRENCH (Zhang et al. 2021)... We extend our setting to consider 3 image classification tasks from the Animals with Attributes 2 dataset (Xian et al. 2018). |
| Dataset Splits | Yes | For each task, we split the dataset into 80% train and validation data and 20% test data. Then we further split training and validation data into N examples per class of labeled validation data. We report results for validation set sizes of N {10, 15, 20, 50, 100}. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for the experiments. It only mentions general concepts like neural networks. |
| Software Dependencies | No | The paper does not specify versions for any software dependencies (e.g., Python, PyTorch, TensorFlow, scikit-learn). It only implicitly refers to programming environments. |
| Experiment Setup | Yes | At a high level, this loss function incorporates a squared penalty for the gradient of our model being less than c times the gradient of the heuristic (along non-abstained dimensions). α serves as a hyperparameter that determines the weighting or importance of the gradient matching term, similar to a weighting parameter for regularization. ... and c > 0. ... In these experiments, we train methods for 10 epochs. |