Learning without Prejudices: Continual Unbiased Learning via Benign and Malignant Forgetting

Authors: Myeongho Jeon, Hyoje Lee, Yedarm Seong, Myungjoo Kang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluations of biased experimental setups demonstrate that our proposed method, Learning without Prejudices, is effective for continual unbiased learning. In this section, we experimentally evaluate the proposed method and compare it with several state-of-the-art models. We used three biased datasets: Biased MNIST (Bahng et al., 2020), Biased CIFAR-10 (Hendrycks & Dietterich, 2019), and Biased Celeb A-HQ modified from (Karras et al., 2017).
Researcher Affiliation Academia Myeongho Jeon*, Hyoje Lee*, Yedarm Seong, Myungjoo Kang Seoul National University {andyjeon, hyoje42, mybirth0407, mkang}@snu.ac.kr
Pseudocode Yes Algorithm 1: Lw P: Learning without Prejudices
Open Source Code No The paper does not contain an explicit statement about the release of open-source code or a link to a code repository.
Open Datasets Yes We used three biased datasets: Biased MNIST (Bahng et al., 2020), Biased CIFAR-10 (Hendrycks & Dietterich, 2019), and Biased Celeb A-HQ modified from (Karras et al., 2017). For Biased MNIST, we use the experimental setup in Section 3.1 to evaluate the model s generalizability. Following the bias planting protocol proposed by Nam et al. (2020), we create the Biased CIFAR-10. Among the attributes of images in Celeb A-HQ (Karras et al., 2017), we set gender as the target label and select makeup and hair color as the bias of the first and second task, respectively, because they have a significant correlation with gender in the dataset. We name this sampled dataset as Biased Celeb A-HQ.
Dataset Splits Yes We randomly split each biased dataset Dt = (Dtrain t , Dval t ) DS into train and validation set to have the same ratio of biased samples.
Hardware Specification No The paper mentions running experiments but does not specify any hardware details like GPU/CPU models, memory, or cloud computing instances used.
Software Dependencies No The paper mentions using "Adam optimizer" and "WGAN-GP" but does not provide specific version numbers for any software, libraries, or frameworks used in the implementation.
Experiment Setup Yes To train the classifier f, for both supervised learning with samples of the current task and contrastive learning with previous samples, we use Adam optimizer with learning rate 10 4, weight decay 5 10 4, and (β1, β2) = (0.9, 0.999). To train the generator G and discriminator D, we use Adam optimizers with learning rate 5 10 5 for G and 2 10 4 for D. We set (β1, β2) = (0.5, 0.999) for Adam in G and D. The batch size and epochs per task are 32 and 20, respectively for all the experiments. For feature-level augmentation, we set channel-wise applied dropout rate γ = 0.2, and use µ = 0 and Σ = 0.005 I for N(µ, Σ) when spatially augmenting.