Adversarial Filters of Dataset Biases
Authors: Ronan Le Bras, Swabha Swayamdipta, Chandra Bhagavatula, Rowan Zellers, Matthew Peters, Ashish Sabharwal, Yejin Choi
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present extensive supporting evidence that AFLITE is broadly applicable for reduction of measurable dataset biases, and that models trained on the filtered datasets yield better generalization to out-of-distribution tasks. We present experiments under a synthetic setting, to evaluate whether AFLITE successfully removes examples with spurious correlations from a dataset. As our first real-world data evaluation for AFLITE, we consider out-of-domain and in-domain generalization for a variety of language datasets. We evaluate AFLITE on image classification through Image Net (ILSVRC2012) classification. |
| Researcher Affiliation | Collaboration | 1Allen Institute for Artificial Intelligence 2Paul G. Allen School of Computer Science, University of Washington. |
| Pseudocode | Yes | Algorithm 1 AFLITE Input: dataset D = (X, Y ), pre-computed representation Φ(X), model family M, target dataset size n, number of random partitions m, training set size t < n, slice size k n, early-stopping threshold Output: reduced dataset S S = D while |S| > n do |
| Open Source Code | Yes | Code & data at https://github.com/allenai/aflite-public All datasets and code for this work are publicly available. |
| Open Datasets | Yes | natural language inference (SNLI; Bowman et al., 2015), and question answering (SQu AD; Rajpurkar et al., 2016). Multi NLI (Williams et al., 2018), and the QNLI dataset (Wang et al., 2018a) Image Net (ILSVRC2012) classification. |
| Dataset Splits | Yes | Table 3 shows the results for SNLI. In all cases, applying AFLITE substantially reduces overall model accuracy, with typical drops of 15-35% depending on the models used for learning the feature representations and those used for evaluation of the filtered dataset. Training set size 550k 92k 138k 109k 92k -458k. We evaluate AFLITE on image classification through Image Net (ILSVRC2012) classification. For evaluation, the Imagenet-AFLITE filtered validation set is much harder than the standard validation set (also see Figure 1). |
| Hardware Specification | No | Computations on beaker.org were supported in part by credits from Google Cloud. |
| Software Dependencies | No | No specific software versions (e.g., Python 3.8, PyTorch 1.9) are provided in the paper. Mentions 'scikit-learn' without a version. |
| Experiment Setup | Yes | Algorithm 1 provides an implementation of AFLITE. The algorithm takes as input a dataset D = (X, Y ), a representation Φ(X) we are interested in minimizing the bias in, a model family M (e.g., linear classifiers), a target dataset size n, size m of the support of the expectation in Eq. (4), training set size t for the classifiers, size k of each slice, and an early-stopping filtering threshold . Appendix A.5 provides details of hyperparameters used across different experimental settings, to be discussed in the following sections. |