reproducibilityindex.ai

Mitigating Dataset Bias by Using Per-Sample Gradient

Authors: Sumyeong Ahn, Seongyoon Kim, Se-Young Yun

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Compared with existing baselines for various datasets, the proposed method showed state-of-the-art accuracy for the classification task. Furthermore, we describe theoretical understandings of how PGD can mitigate dataset bias. 1 INTRODUCTION ... 4 EXPERIMENTS In this section, we demonstrate the effectiveness of PGD for multiple benchmarks compared with previous proposed baselines. ... Accuracy results. In Table 1, we present the comparisons of the image classification accuracy for the unbiased test sets.
Researcher Affiliation	Academia	Sumyeong Ahn KAIST AI sumyeongahn@kaist.ac.kr Seongyoon Kim KAIST ISys E curisam@kaist.ac.kr Se-young Yun KAIST AI yunseyoung@kaist.ac.kr
Pseudocode	Yes	Algorithm 1 PGD: Per-sample Gradient-norm based Debiasing
Open Source Code	No	The paper mentions using and reproducing results from 'official code from the respective authors' for baselines (e.g., Lf F, JTT, Disen) with accompanying footnotes providing links to those repositories. However, it does not explicitly state that the authors are releasing their own code for the proposed PGD method, nor does it provide a direct link to their own implementation.
Open Datasets	Yes	To precisely examine the debiasing performance of PGD, we used the Colored MNIST, Multi-bias MNIST, and Corrupted CIFAR datasets as synthetic datasets... BFFHQ, BAR, Celeb A, and Civil Comments-WILDS datasets obtained from the real-world are used... Colored MNIST (CMNIST) is a modified version of MNIST dataset (Le Cun et al., 2010)... CIFAR10 (Krizhevsky et al., 2009)... Celeb A (Liu et al., 2015)... Civil Comments-WILDS (Borkan et al., 2019)... Biased action recognition dataset was derived from (Nam et al., 2020)... Biased FFHQ (BFFHQ) dataset was constructed from the facial dataset, FFHQ (Karras et al., 2019). It was first proposed in (Kim et al., 2021) and was used in (Lee et al., 2021).
Dataset Splits	Yes	We use 55,000 samples for training 5,000 samples for validation (i.e., 10%), and 10,000 samples for testing. ... As with CMNIST, we use 55,000 samples for training and 5,000 samples for validation, and 10,000 samples for testing. ... This dataset contains 45,000 in training samples, 5,000 in validation samples, and 10,000 in testing images. ... There are 1,941 samples for training and 6,54 samples for testing. To split the training and validation samples, we used 10% validation samples, i.e., 1,746 images for training and 195 validation. ... The number of training, validation, and test samples are 19,200, 1,000, and 1,000, respectively.
Hardware Specification	Yes	We conduct our experiments mainly using a single Titan XP GPU for all cases.
Software Dependencies	No	The paper mentions software components like 'SGD optimizer', 'Adam optimizer', 'Res Net18 (provided by the open-source library torchvision)', 'pretrained BERT', and 'Cosine Annealing LR decay scheduler'. However, it does not specify version numbers for any of these software dependencies.
Experiment Setup	Yes	Implementation details. We use three types of networks: two types of simple convolutional networks (Sim Conv-1 and Sim Conv-2) and Res Net18 (He et al., 2016). Network imeplementation is described in Appendix B. Colored MNIST is trained on SGD optimizer, batch size 128, learning rate 0.02, weight decay 0.001, momentum 0.9, learning rate decay 0.1 every 40 epochs, 100 epochs training, and GCE parameter α 0.7. For Multi-bias MNIST, it also utilizes SGD optimizer, and 32 batch size, learning rate 0.01, weight decay 0.0001, momentum 0.9, learning rate decay 0.1 with decay step 40. It runs 100 epochs with GCE parameter 0.7. For corrupted CIFAR and BFFHQ, it uses Res Net18 as a backbone network, and exactly the same setting presented by Disen (Lee et al., 2021). ... A summary of the hyperparameters that we used is reported in Table 6.