Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Task-Agnostic Undesirable Feature Deactivation Using Out-of-Distribution Data
Authors: Dongmin Park, Hwanjun Song, Minseok Kim, Jae-Gil Lee
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To show the task-agnostic nature of TAUFE, we rigorously validate its performance on three tasks, classification, regression, and a mix of them, on CIFAR-10, CIFAR-100, Image Net, CUB200, and CAR datasets. The results demonstrate that TAUFE consistently outperforms the state-of-the-art method as well as the baselines without regularization. |
| Researcher Affiliation | Collaboration | Dongmin Park1, Hwanjun Song2, Min Seok Kim1, Jae-Gil Lee1 1 KAIST, 2 NAVER AI Lab Republic of Korea |
| Pseudocode | Yes | Algorithm 1 describes the overall procedure of TAUFE, which is self-explanatory. Algorithm 1 TAUFE |
| Open Source Code | Yes | For reproducibility, we provide the source code at https://github.com/kaist-dmlab/TAUFE. |
| Open Datasets | Yes | We choose CIFAR-10, CIFAR-100 [23], and Image Net [24] for the target in-distribution data. For the CIFAR datasets, two out-of-distribution datasets are carefully mixed for evaluation LSUN [25],... and SVHN [26],... A large-scale collection of place scene images with 365 classes, Places365 [27], is also used as another OOD data for Image Net-10. |
| Dataset Splits | No | The paper mentions a grid search for hyperparameters, implying the use of a validation set, but does not specify the exact split percentages or sample counts for training, validation, and test sets. For example: 'The value of λ is set to be 0.1 and 0.01 for CIFARs and Image Net-10, respectively, where the best values are obtained via a grid search.' |
| Hardware Specification | Yes | All methods are implemented with Py Torch 1.8.0 and executed using four NVIDIA Tesla V100 GPUs. |
| Software Dependencies | Yes | All methods are implemented with Py Torch 1.8.0 and executed using four NVIDIA Tesla V100 GPUs. |
| Experiment Setup | Yes | For CIFAR datasets, Res Net-18 [28] is trained from scratch for 200 epochs using SGD with a momentum of 0.9, a batch size of 64, a weight decay of 0.0005. To support the original resolution, we drop the first pooling layer and change the first convolution layer with a kernel size of 3, a stride size of 1, and a padding size of 1. An initial learning rate of 0.1 is decayed by a factor of 10 at 100-th and 150-th epochs, following the same configuration in OAT [8]. |