Task-Agnostic Undesirable Feature Deactivation Using Out-of-Distribution Data

Authors: Dongmin Park, Hwanjun Song, Minseok Kim, Jae-Gil Lee

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To show the task-agnostic nature of TAUFE, we rigorously validate its performance on three tasks, classification, regression, and a mix of them, on CIFAR-10, CIFAR-100, Image Net, CUB200, and CAR datasets. The results demonstrate that TAUFE consistently outperforms the state-of-the-art method as well as the baselines without regularization.
Researcher Affiliation Collaboration Dongmin Park1, Hwanjun Song2, Min Seok Kim1, Jae-Gil Lee1 1 KAIST, 2 NAVER AI Lab Republic of Korea
Pseudocode Yes Algorithm 1 describes the overall procedure of TAUFE, which is self-explanatory. Algorithm 1 TAUFE
Open Source Code Yes For reproducibility, we provide the source code at https://github.com/kaist-dmlab/TAUFE.
Open Datasets Yes We choose CIFAR-10, CIFAR-100 [23], and Image Net [24] for the target in-distribution data. For the CIFAR datasets, two out-of-distribution datasets are carefully mixed for evaluation LSUN [25],... and SVHN [26],... A large-scale collection of place scene images with 365 classes, Places365 [27], is also used as another OOD data for Image Net-10.
Dataset Splits No The paper mentions a grid search for hyperparameters, implying the use of a validation set, but does not specify the exact split percentages or sample counts for training, validation, and test sets. For example: 'The value of λ is set to be 0.1 and 0.01 for CIFARs and Image Net-10, respectively, where the best values are obtained via a grid search.'
Hardware Specification Yes All methods are implemented with Py Torch 1.8.0 and executed using four NVIDIA Tesla V100 GPUs.
Software Dependencies Yes All methods are implemented with Py Torch 1.8.0 and executed using four NVIDIA Tesla V100 GPUs.
Experiment Setup Yes For CIFAR datasets, Res Net-18 [28] is trained from scratch for 200 epochs using SGD with a momentum of 0.9, a batch size of 64, a weight decay of 0.0005. To support the original resolution, we drop the first pooling layer and change the first convolution layer with a kernel size of 3, a stride size of 1, and a padding size of 1. An initial learning rate of 0.1 is decayed by a factor of 10 at 100-th and 150-th epochs, following the same configuration in OAT [8].