Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
SelecMix: Debiased Learning by Contradicting-pair Sampling
Authors: Inwoo Hwang, Sangjun Lee, Yunhyeok Kwak, Seong Joon Oh, Damien Teney, Jin-Hwa Kim, Byoung-Tak Zhang
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on standard benchmarks demonstrate the effectiveness of the method, in particular when label noise complicates the identification of bias-conflicting examples. We evaluate our method on standard benchmarks for debiasing. Experimental results suggest that Selec Mix consistently outperforms prior methods, especially when bias-conflicting samples are scarce. |
| Researcher Affiliation | Collaboration | Inwoo Hwang1 Sangjun Lee1 Yunhyeok Kwak1 Seong Joon Oh3 Damien Teney4 Jin-Hwa Kim 12 Byoung-Tak Zhang 1 1AI Institute, Seoul National University 2NAVER AI Lab 3University of Tübingen 4Idiap Research Institute |
| Pseudocode | Yes | The pseudo-code of the proposed algorithm is presented in Alg. 1 and Alg. 2. |
| Open Source Code | No | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] |
| Open Datasets | Yes | Datasets. The Colored MNIST is a modified MNIST [19]... The Corrupted CIFAR10 is constructed by applying different types of corruptions to the corresponding objects in the original CIFAR-10 [18] dataset... The Biased FFHQ (BFFHQ) [20] is constructed based on the real-world dataset FFHQ [12]... All datasets are available in the official repository of DFA [20]. |
| Dataset Splits | No | The paper mentions that it evaluates unbiased accuracy on the test set of the dataset and specifies the ratio of bias-conflicting samples in the training set. However, it does not explicitly describe a validation dataset split (percentages, counts, or methodology) for hyperparameter tuning or early stopping. |
| Hardware Specification | Yes | All experiments are conducted on NVIDIA A100 GPUs. |
| Software Dependencies | Yes | All experiments are implemented using PyTorch (1.10.0). |
| Experiment Setup | Yes | We train models for 200 epochs (Colored MNIST and Corrupted CIFAR-10) and 100 epochs (BFFHQ) with SGD optimizer (momentum=0.9, weight decay=1e-4). The initial learning rate is 0.1, and it is annealed by 0.1 at 100 and 150 epochs (Colored MNIST and Corrupted CIFAR-10), and 50 and 80 epochs (BFFHQ). The batch size is 128. The temperature hyperparameter of the contrastive loss is 0.07. |