SODA: Robust Training of Test-Time Data Adaptors
Authors: Zige Wang, Yonggang Zhang, Zhen Fang, Long Lan, Wenjing Yang, Bo Han
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To verify the efficacy of SODA, we evaluate it on three widely used benchmark datasets under various settings. Our experimental results demonstrate that SODA can effectively mitigate the performance degradation of deployed models in the presence of distribution shifts. |
| Researcher Affiliation | Academia | Zige Wang1,2 Yonggang Zhang2 Zhen Fang3 Long Lan4 Wenjing Yang4 Bo Han2 1School of Computer Science, Peking University 2Hong Kong Baptist University 3University of Technology Sydney 4National University of Defense Technology |
| Pseudocode | Yes | Algorithm 1 SODA framework |
| Open Source Code | Yes | The code implementation can be found at https://github.com/tmlr-group/SODA. |
| Open Datasets | Yes | Datasets. We first evaluate our proposed framework SODA on two widely used out-of-distribution benchmarks, namely CIFAR-10-C and CIFAR-100-C [14]... We further evaluate SODA on a large scale dataset Image Net-C [14]... |
| Dataset Splits | Yes | Datasets. We first evaluate our proposed framework SODA on two widely used out-of-distribution benchmarks, namely CIFAR-10-C and CIFAR-100-C [14], each containing 10,000 CIFAR-10/100 test images corrupted by 19 kinds of corruptions with 5 severity levels. We further evaluate SODA on a large scale dataset Image Net-C [14] with 50,000 Image Net validation images corrupted by the same corruptions as CIFAR10/100-C. |
| Hardware Specification | Yes | The experiments are conducted using NVIDIA A100-PCIE-40GB GPU with CUDA 11.7. |
| Software Dependencies | Yes | In our paper, all models are implemented using Py Torch 1.13.1. The Image Net pre-trained weights used in DINE and BETA are downloaded from Torch Vision 0.14.1. The experiments are conducted using NVIDIA A100-PCIE-40GB GPU with CUDA 11.7. |
| Experiment Setup | Yes | For all methods except DINE, BETA, DA-PGD, and SODA-R, the data adaptor/model is optimized using SGD with learning rate = 1e-3, momentum = 0.9, and weight decay = 1e-5. Batch size = 256 is fixed for all methods. The number of training epochs = 150 for all baselines except DINE and BETA. For all methods using ZOO, the query number q = 5 for CIFAR-10-C and Image Net-C and q = 10 for CIFAR-100-C, smoothing parameter ยต = 1e-3. |