reproducibilityindex.ai

COMMA: Co-articulated Multi-Modal Learning

Authors: Lianyu Hu, Liqing Gao, Zekang Liu, Chi-Man Pun, Wei Feng

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method across three representative tasks of generalization to novel classes, new target datasets and unseen domain shifts. Experimental results demonstrate the superiority of our method by exhibiting a favorable performance boost upon all tasks with high efﬁciency.
Researcher Affiliation	Academia	1College of Intelligence and Computing, Tianjin University, China 2Department of Computer and Information Science, University of Macau, China
Pseudocode	No	The paper describes the method using mathematical equations and text but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/hulianyuyy/COMMA
Open Datasets	Yes	For base-to-novel generalization and cross-dataset evaluation, we follow previous methods (Khattak et al. 2023; Yao, Zhang, and Xu 2023) to evaluate the performance of our method on 11 image classiﬁcation datasets, including two generic-objects datasets, Image Net (Deng et al. 2009) and Caltech101 (Fei-Fei, Fergus, and Perona 2004); ﬁve ﬁne-grained datasets, Oxford Pets (Parkhi et al. 2012), Stanford Cars (Krause et al. 2013), Flowers102 (Nilsback and Zisserman 2008), Food101 (Bossard, Guillaumin, and Van Gool 2014), and FGVCAircraft (Maji et al. 2013); a scene recognition dataset SUN397 (Xiao et al. 2010); an action recognition dataset UCF101 (Soomro, Zamir, and Shah 2012); a texture dataset DTD (Cimpoi et al. 2014) and a satelliteimage dataset Euro SAT (Helber et al. 2019).
Dataset Splits	Yes	Base-to-Novel Generalization: The datasets are split into base and novel classes to evaluate the model in a zero-shot manner. The model is trained on the base classes in a few-shot setting and evaluated on base and novel classes.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU, CPU models, or memory) used for running the experiments.
Software Dependencies	No	The paper mentions using a 'pretrained Vi T-B/16 CLIP model' and 'SGD optimizer' but does not specify version numbers for any software libraries or dependencies like Python, PyTorch, or CUDA.
Experiment Setup	Yes	For all experiments, we use the pretrained Vi T-B/16 CLIP model by default with dl = 512, dl = 768. We use a 16-shot training strategy in all experiments by default which randomly samples 16 shots for each class. Following previous methods (Khattak et al. 2023), we set prompt depth J to 9 and the language and vision prompt lengths to 2. We train our models for 5 epochs with a batchsize of 4 and a learning rate of 0.0035 with the SGD optimizer.