Data Augmentations for Improved (Large) Language Model Generalization
Authors: Amir Feder, Yoav Wald, Claudia Shi, Suchi Saria, David Blei
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experimentation on learning caregiver-invariant predictors of clinical diagnoses from medical narratives and on semi-synthetic data, we demonstrate that our method for simulating interventions improves out-of-distribution (OOD) accuracy compared to baseline invariant learning algorithms.We empirically study the following questions: (1) Can CATO enhance OOD performance of downstream classifiers? (2) Does it surpass the combination of reweighting and invariance penalties? (3) Is it more effective than alternative augmentation techniques, thus demonstrating the usefulness of the causal graph? (4) How sensitive is CATO to quality of counterfactuals? These questions seek to establish causally-motivated augmentations as a practical approach for improving OOD performance. We address Q#1,#2 and #3 through our theoretical foundation and across all empirical studies, while Q#4 is explored in the synthetic experiments. Further details about the experimental setup, including data statistics, model hyperparameters, and data splits, can be found in Appendix B. Table 1 provides an overview of the tasks we experiment with. |
| Researcher Affiliation | Collaboration | Amir Feder 1,2, Yoav Wald 3, Claudia Shi 1, Suchi Saria 3 and David Blei 1 1 Columbia University, 2 Google Research, 3 Johns Hopkins University |
| Pseudocode | Yes | Algorithm 1 CATO |
| Open Source Code | No | No explicit statement about releasing code or a link to a repository for the described methodology. |
| Open Datasets | Yes | We utilize several electronic health records (EHR) datasets. We train on MIMIC-III [86], a widely-used medical dataset containing over 2 million notes from 38,597 adult patients, 49,785 hospital admissions, and 3,500 healthcare professionals between 2001 and 2012.We use the CEBa B dataset [49], which consists of short restaurant reviews and ratings from Open Table, including evaluations for food, service, noise, ambiance, and an overall rating. |
| Dataset Splits | Yes | Further details about the experimental setup, including data statistics, model hyperparameters, and data splits, can be found in Appendix B.To estimate divergences between these two distributions, we may use validation sets from our training data. |
| Hardware Specification | No | No specific hardware details (GPU models, CPU models, etc.) are mentioned in the paper. |
| Software Dependencies | No | While some software is mentioned (e.g., 'Pub MED BERT', 'Pytorch', 'Huggingface's transformers', 'Scikitlearn', 'GPT-4'), specific version numbers for these dependencies are not provided within the text. |
| Experiment Setup | Yes | Further details about the experimental setup, including data statistics, model hyperparameters, and data splits, can be found in Appendix B. |