reproducibilityindex.ai

Spurious Feature Diversification Improves Out-of-distribution Generalization

Authors: LIN Yong, Lu Tan, Yifan HAO, Ho Nam Wong, Hanze Dong, WEIZHONG ZHANG, Yujiu Yang, Tong Zhang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically we demonstrate the effectiveness of utilizing diverse spurious features on a Multi Color MNIST dataset, and our experimental results are consistent with the theoretical analysis. Building upon the new theoretical insights into the efficacy of ensemble methods, we further identify an issue of Wi SE-FT caused by the overconfidence of finetuned models in OOD situations. To remedy this problem, we propose a novel method called BAla Nced avera Ging (BANG) to mitigate the overconfidence problem, which significantly enhances the OOD performance of Wi SE-FT.
Researcher Affiliation	Academia	The Hong Kong University of Science and Technology, Tsinghua University, Fudan University University of Illinois Urbana-Champaign.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link indicating the release of open-source code for the methodology described.
Open Datasets	Yes	To further verify our findings, we introduce Multi Color MNIST in Section 3.4, a novel variant of CMNIST (Arjovsky et al., 2019), with multiple spurious features. We impose Mixup or Label Smoothing during fine-tuning the pre-trained CLIP on Image Net (IN), and test OOD performance on IN-V2, IN-R, IN-A, INS and Object Net. The performance of the baseline methods are from the Table 8 of (Wortsman et al., 2022).
Dataset Splits	No	The paper mentions using a 'validation set from the Places365-standard' in Appendix E.2, but does not provide explicit details about the train/validation/test splits used for all experiments, nor does it define the splits for the custom-built Multi Color MNIST dataset.
Hardware Specification	No	The paper mentions using 'CLIP Vi T-B/16' as the model, but it does not specify any hardware details (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using 'Adam W optimizer with the default Py Torch Adam W hyperparameters' and 'MMPre Train Contributors (2023)' for default hyperparameters, but it does not specify version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	We use the Adam W optimizer with the default Py Torch Adam W hyperparameters and choose 512 as batch size. We use a learning rate of 3 10 5, gradient clipping at global norm 1 and fine-tune for a total of 10 epochs. The settings mentioned above are the same with Wortsman et al. (2022).