Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Enhancing Compositional Generalization via Compositional Feature Alignment

Authors: Haoxiang Wang, Haozhe Si, Huajie Shao, Han Zhao

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We further conduct extensive experiments on CG-Bench for CLIP and DINOv2, two powerful pretrained vision foundation models. Experiment results show that CFA outperforms common finetuning techniques in compositional generalization, corroborating CFA s efficacy in compositional feature learning. The code is released at https://github.com/Haoxiang-Wang/Compositional-Feature-Alignment.
Researcher Affiliation	Academia	Haoxiang Wang EMAIL Department of Electrical and Computer Engineering University of Illinois Urbana-Champaign Haozhe Si EMAIL Department of Electrical and Computer Engineering University of Illinois Urbana-Champaign Huajie Shao EMAIL Department of Computer Science William and Mary Han Zhao EMAIL Department of Computer Science University of Illinois Urbana-Champaign
Pseudocode	No	The paper describes the method and its stages using textual descriptions and a diagram (Figure 3), but it does not include a clearly labeled pseudocode block or algorithm section.
Open Source Code	Yes	The code is released at https://github.com/Haoxiang-Wang/Compositional-Feature-Alignment.
Open Datasets	Yes	We create CG-Bench, a compositional generalization benchmark built on four datasets previously designed for DG research: Office-Home (Venkateswara et al., 2017), Domain Net (Peng et al., 2019), and i Wild Cam (Beery et al., 2020) & FMo W (Christie et al., 2018) from the WILDS benchmark (Koh et al., 2021).
Dataset Splits	Yes	We randomly divide Domian Net into training and evaluation sets, with an 80:20 split. A CLIP model is fully fine-tuned on this training data, and evaluated on validation data from all domain-class combination. ... The ID data is then further segregated into a training set and an ID validation set at a 9:1 ratio. Meanwhile, the OOD data is divided between OOD validation and test sets.
Hardware Specification	Yes	The experiments described in this paper are executed on NVIDIA RTX A6000 GPUs with 48GB memory, utilizing a total of 12 GPUs.
Software Dependencies	No	The paper mentions using specific models like CLIP (Radford et al., 2021) and DINOv2 (Oquab et al., 2023), and an optimizer like Adam W (Loshchilov & Hutter, 2017). However, it does not specify version numbers for these or for broader software dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	We present the hyperparameter settings for our CFA models in Table 4. The parameters for each stage are chosen based on model performances on the OOD validation set. Note that λ is the domain loss coefficient in (3), and λortho is the coefficient for the orthogonality regularization loss }W T 1 W2}2 F that we use to ensure orthogonality of heads in Stage 1. The hyper-parameters in Stage 2 are also used for the two baseline algorithms, Full finetuning (FT) and LP-FT. ... Table 4: Hyperparameters for our algorithm. Model CLIP DINOv2 Dataset Stage 1 (Linear Probing) Stage 2 (Fine-Tuning) Stage 1 (Linear Probing) Stage 2 (Fine-Tuning) λ λortho Epochs Epochs Learning Rate λ λortho Epochs Epochs Learning Rate Office Home 1 100 200 3 10 5 1 100 200 3 5 ˆ 10 5 Domain Net 1 10000 200 3 10 5 1000 10 200 10 5 ˆ 10 5 i Wild Cam 10 10 200 5 10 5 1 100 200 5 10 5 FMo W 10 100 200 3 10 5 100 1000 200 4 5 ˆ 10 5