reproducibilityindex.ai

Auxiliary Losses for Learning Generalizable Concept-based Models

Authors: Ivaxi Sheth, Samira Ebrahimi Kahou

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper presents extensive experiments on real-world datasets for image classification tasks, namely CUB, Aw A2, Celeb A and TIL. We also study the performance of coop-CBM models under various distributional shift settings. We show that our proposed method achieves higher accuracy in all distributional shift settings even compared to the black-box models with the highest concept accuracy.
Researcher Affiliation	Academia	Ivaxi Sheth CISPA-Helmholtz Center for Information Security ivaxi.sheth@cispa.de Samira Ebrahimi Kahou École de technologie supérieure, Mila, CIFAR AI Chair samira.ebrahimi-kahou@etsmtl.ca
Pseudocode	Yes	Algorithm 1 Intervention selector Pseudocode
Open Source Code	Yes	Our codebase is available at https://github.com/ivaxi0s/coop-cbm and is built upon from open source repos [27, 41].
Open Datasets	Yes	We use Caltech-UCSD Birds-200-2011 (CUB) [55] dataset for the task of bird identification. We additionally use Animals with Attributes 2 (Aw A2) [57] dataset for the task of animal classification. We use all of the subsets of the Tumor-Infiltrating Lymphocytes (TIL) [42] dataset for cancer cell classification. For m-Celeb A[31] dataset, we train using 64 batch size with Adam optimizer with 0.9 momentum and learning rate of 5 10 3 for 500 epochs. The feature extractor was Inception V3[50] as a concept encoder model.
Dataset Splits	Yes	We use a traditional 70%-10%-20% random split for training, validation, and testing datasets.
Hardware Specification	Yes	We trained on Linux-based clusters mainly on V100 GPUs and partially on A100 GPU.
Software Dependencies	No	The paper mentions using specific models (e.g., Inception V3, VIT) and optimizers (SGD, Adam) but does not provide specific version numbers for the software frameworks or libraries used for implementation (e.g., PyTorch, TensorFlow versions).
Experiment Setup	Yes	For CUB[55] dataset, we trained using 128 batch size with SGD optimizer with 0.9 momentum and learning rate of 10 2. The feature extractor was Inception V3[50] as a concept encoder model. ... Across all of the models for tasks, we use weight decay of factor of 5 10 5 and scale the learning rate by a factor of 0.1 if no improvement has been seen in validation loss for the last 15 epochs during training. We also train using an early stopping mechanism i.e. if the validation loss does not improve for 200 epochs, we stop training.