Mitigating Dataset Bias by Using Per-Sample Gradient
Authors: Sumyeong Ahn, Seongyoon Kim, Se-Young Yun
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Compared with existing baselines for various datasets, the proposed method showed state-of-the-art accuracy for the classification task. Furthermore, we describe theoretical understandings of how PGD can mitigate dataset bias. 1 INTRODUCTION ... 4 EXPERIMENTS In this section, we demonstrate the effectiveness of PGD for multiple benchmarks compared with previous proposed baselines. ... Accuracy results. In Table 1, we present the comparisons of the image classification accuracy for the unbiased test sets. |
| Researcher Affiliation | Academia | Sumyeong Ahn KAIST AI sumyeongahn@kaist.ac.kr Seongyoon Kim KAIST ISys E curisam@kaist.ac.kr Se-young Yun KAIST AI yunseyoung@kaist.ac.kr |
| Pseudocode | Yes | Algorithm 1 PGD: Per-sample Gradient-norm based Debiasing |
| Open Source Code | No | The paper mentions using and reproducing results from 'official code from the respective authors' for baselines (e.g., Lf F, JTT, Disen) with accompanying footnotes providing links to *those* repositories. However, it does not explicitly state that the authors are releasing their own code for the proposed PGD method, nor does it provide a direct link to their own implementation. |
| Open Datasets | Yes | To precisely examine the debiasing performance of PGD, we used the Colored MNIST, Multi-bias MNIST, and Corrupted CIFAR datasets as synthetic datasets... BFFHQ, BAR, Celeb A, and Civil Comments-WILDS datasets obtained from the real-world are used... Colored MNIST (CMNIST) is a modified version of MNIST dataset (Le Cun et al., 2010)... CIFAR10 (Krizhevsky et al., 2009)... Celeb A (Liu et al., 2015)... Civil Comments-WILDS (Borkan et al., 2019)... Biased action recognition dataset was derived from (Nam et al., 2020)... Biased FFHQ (BFFHQ) dataset was constructed from the facial dataset, FFHQ (Karras et al., 2019). It was first proposed in (Kim et al., 2021) and was used in (Lee et al., 2021). |
| Dataset Splits | Yes | We use 55,000 samples for training 5,000 samples for validation (i.e., 10%), and 10,000 samples for testing. ... As with CMNIST, we use 55,000 samples for training and 5,000 samples for validation, and 10,000 samples for testing. ... This dataset contains 45,000 in training samples, 5,000 in validation samples, and 10,000 in testing images. ... There are 1,941 samples for training and 6,54 samples for testing. To split the training and validation samples, we used 10% validation samples, i.e., 1,746 images for training and 195 validation. ... The number of training, validation, and test samples are 19,200, 1,000, and 1,000, respectively. |
| Hardware Specification | Yes | We conduct our experiments mainly using a single Titan XP GPU for all cases. |
| Software Dependencies | No | The paper mentions software components like 'SGD optimizer', 'Adam optimizer', 'Res Net18 (provided by the open-source library torchvision)', 'pretrained BERT', and 'Cosine Annealing LR decay scheduler'. However, it does not specify version numbers for any of these software dependencies. |
| Experiment Setup | Yes | Implementation details. We use three types of networks: two types of simple convolutional networks (Sim Conv-1 and Sim Conv-2) and Res Net18 (He et al., 2016). Network imeplementation is described in Appendix B. Colored MNIST is trained on SGD optimizer, batch size 128, learning rate 0.02, weight decay 0.001, momentum 0.9, learning rate decay 0.1 every 40 epochs, 100 epochs training, and GCE parameter α 0.7. For Multi-bias MNIST, it also utilizes SGD optimizer, and 32 batch size, learning rate 0.01, weight decay 0.0001, momentum 0.9, learning rate decay 0.1 with decay step 40. It runs 100 epochs with GCE parameter 0.7. For corrupted CIFAR and BFFHQ, it uses Res Net18 as a backbone network, and exactly the same setting presented by Disen (Lee et al., 2021). ... A summary of the hyperparameters that we used is reported in Table 6. |