LCA-on-the-Line: Benchmarking Out of Distribution Generalization with Class Taxonomies

Authors: Jia Shi, Gautam Rajendrakumar Gare, Jinjin Tian, Siqi Chai, Zhiqiu Lin, Arun Balajee Vasudevan, Di Feng, Francesco Ferroni, Shu Kong

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We assess 75 models using Image Net as the ID dataset and five significantly shifted OOD variants, uncovering a strong linear correlation between ID LCA distance and OOD top-1 accuracy. Our method provides a compelling alternative for understanding why VLMs tend to generalize better. Additionally, we propose a technique to construct a taxonomic hierarchy on any dataset using K-means clustering, demonstrating that LCA distance is robust to the constructed taxonomic hierarchy.
Researcher Affiliation Collaboration 1Carnegie Mellon University 2Work done at Argo AI Gmb H 3Apple 4Nvidia 5Texas A&M University 6University of Macau.
Pseudocode No The paper describes its methods in prose and refers to external "code snippet" for simulation details in Appendix C, but it does not include structured pseudocode or clearly labeled algorithm blocks within the paper document.
Open Source Code Yes Open source code in our Project Page.
Open Datasets Yes We use Image Net (Deng et al., 2009b) as the source in-distribution (ID) dataset, while Image Net-v2 (Recht et al., 2019), Image Net-Sketch (Hendrycks & Dietterich, 2019), Image Net-Rendition (Hendrycks et al., 2021), Image Net-Adversarial (Hendrycks et al., 2021), and Object Nets (Barbu et al., 2019) are employed as out-of-distribution datasets, exemplifying severe natural distribution shifts.
Dataset Splits No The paper mentions Image Net as an ID dataset and others as OOD datasets for evaluation. While it describes experimental setups like linear probing, it does not explicitly state the specific training, validation, and test splits (e.g., percentages or sample counts) for its experiments, nor does it refer to specific standard splits with citations for these purposes.
Hardware Specification Yes For our computational resources, we utilized a single NVIDIA Ge Force GTX 1080 Ti GPU.
Software Dependencies No The paper mentions the use of a specific GPU but does not list any software components (e.g., libraries, frameworks, or operating systems) with specific version numbers that are necessary for replicating the experiments.
Experiment Setup Yes The learning rate was set to 0.001, batch size=1024. We used the Adam W optimizer with a weight decay and a cosine learning rate scheduler with a warm-up iteration. The warm-up type was set to linear with a warm-up learning rate of 1e-5. The experiment was run for 50 epochs.