Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Exploring Transferable Homogenous Groups for Compositional Zero-Shot Learning

Authors: Zhijie Rao, Jingcai Guo, Miaoge Li, Yang Chen, Mengzhu Wang

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on three benchmark datasets validate the effectiveness of our method. Code is available at: https://github.com/zjrao/HGRL. [...] We conduct experiments on three major benchmark datasets, and the results show that the proposed method achieves state-of-the-art performance.
Researcher Affiliation	Academia	Zhijie Rao , Jincai Guo , Miaoge Li , Yang Chen and Mengzhu Wang Department of COMP/LSGI, The Hong Kong Polytechnic University, Hong Kong SAR EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology in prose and mathematical equations, and includes a figure illustrating the overview of the proposed method (Figure 2), but does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at: https://github.com/zjrao/HGRL.
Open Datasets	Yes	We perform experiments on three commonly used datasets including MITStates [Isola et al., 2015], UT-Zappos [Yu and Grauman, 2014] and C-GQA [Naeem et al., 2021].
Dataset Splits	Yes	MIT-States has 115 states and 245 objects. The number of seen compositions is 1262 and unseen compositions is 400. UT-Zappos is a small footwear dataset with 16 states and 12 objects. There are 83 seen compositions for training and 18 unseen compositions for testing. C-GQA is a challenging dataset containing 413 states and 674 objects. There are 5592 seen compositions and 923 unseen combinations.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. It only mentions the use of a pre-trained CLIP model as a backbone.
Software Dependencies	No	The paper mentions using 'CLIP [Radford et al., 2021] Vi T-L/14 model as the backbone' but does not specify any other software dependencies, libraries, or their version numbers, which are essential for reproducibility.
Experiment Setup	Yes	The learning rate is 5e 4 for UT-Zappos and 5e 5 for MIT-States and C-GQA. The batch size is 180 for UTZappos and 32 for MIT-States and C-GQA. We use Adam optimizer to train the model. The group number of state ks and object ko are set to 3 for UT-Zappos and 5 for MIT-States and C-GQA. The hyper-parameter λ is set to 1.0 for UT-Zappos and 0.1 for MIT-States and C-GQA.