Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Debiased Classifier with Biased Committee
Authors: Nayeong Kim, SEHYUN HWANG, Sungsoo Ahn, Jaesik Park, Suha Kwak
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On five real-world datasets, our method outperforms prior arts using no spurious attribute label like ours and even surpasses those relying on bias labels occasionally. Our code is available at https://github.com/nayeong-v-kim/LWBC. |
| Researcher Affiliation | Academia | Pohang University of Science and Technology (POSTECH), South Korea EMAIL |
| Pseudocode | Yes | Algorithm 1 Learning a debiased classifier with a biased committee |
| Open Source Code | Yes | Our code is available at https://github.com/nayeong-v-kim/LWBC. |
| Open Datasets | Yes | Celeb A. Celeb A [31] is a dataset for face recognition where each sample is labeled with 40 attributes. Image Net-9. Image Net-9 [20] is a subset of Image Net [35] containing nine super-classes. Image Net-A. Image Net-A [17] contains real-world images misclassified by an Image Net-trained Res Net 50 [15]. BAR. The Biased Action Recognition (BAR) dataset [32] is a real-world image dataset intentionally designed to exhibit spurious correlations between human action and place on its images. NICO. NICO [16] is a real-world dataset for simulating out-of-distribution image classification scenarios. |
| Dataset Splits | Yes | Following the setting adopted by Bahng et al. [3], we conduct experiments with 54,600 training images and 2,100 validation images. In our setting, we use 10% of the original BAR training set as validation and set the bias-conflicting ratio of the training set to 1%. The validation and test sets consist of 7 seen context classes and 3 unseen context classes per object class. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used (e.g., GPU model, CPU type) for running its experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software components or libraries (e.g., PyTorch version, CUDA version) needed for replication. |
| Experiment Setup | Yes | We set the batch size to {64, 64, 128, 256}, learning rate to {1e-3, 1e-3, 1e-4, 6e-3}, the size of the committee m to {30, 30, 30, 40}, the size of subset Sl to {10, 10, 80, 300}, λ to {0.9, 0.6, 0.6, 0.6}, and τ to {1, 1, 1, 2.5}, respectively for {BAR, NICO, Imagenet-9, Celeb A}, and α to 0.02. Note that we run LWBC on 3 random seeds and report the average and the standard deviation. |