Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Closed-form Sample Probing for Learning Generative Models in Zero-shot Learning

Authors: Samet Cetin, Orhun Buğra Baran, Ramazan Gokberk Cinbis

ICLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 EXPERIMENTS Datasets. We use the four mainstream GZSL benchmark datasets: Caltech-UCSD-Birds (CUB) (Wah et al., 2011), SUN Attribute (SUN) (Patterson & Hays, 2012), Animals with Attributes 2 (AWA2, more simply AWA) (Xian et al., 2018a) and Oxford Flowers (FLO) (Nilsback & Zisserman, 2008).
Researcher Affiliation Academia Samet Cetin, Orhun Bugra Baran & Ramazan Gokberk Cinbis Middle East Technical University Department of Computer Engineering Ankara, Turkey EMAIL
Pseudocode No The paper includes figures illustrating the framework and compute graph but does not contain a formal pseudocode block or algorithm steps.
Open Source Code No We will provide our source code on a public repository.
Open Datasets Yes We use the four mainstream GZSL benchmark datasets: Caltech-UCSD-Birds (CUB) (Wah et al., 2011), SUN Attribute (SUN) (Patterson & Hays, 2012), Animals with Attributes 2 (AWA2, more simply AWA) (Xian et al., 2018a) and Oxford Flowers (FLO) (Nilsback & Zisserman, 2008).
Dataset Splits Yes Therefore, to obtain comparable results within our experiments, we use the following policy to tune the hyper-parameters of our approach and our baselines: we first leave-out 20% of train class samples as val-seen samples. We periodically train a supervised classifier by taking synthetic samples from the generative model, and evaluate it on the validation set, consisting of the aforementioned val-seen samples plus the val-unseen samples with respect to the benchmark splits.
Hardware Specification No The numerical calculations were partially performed at TUBITAK ULAKBIM, High Performance and Grid Computing Center (TRUBA resources). This mentions a computing center but lacks specific hardware details like GPU/CPU models or memory.
Software Dependencies No The paper mentions using frameworks like c WGAN, Lis GAN, TF-VAEGAN, FREE, and ESZSL, and a ResNet-101 backbone, but it does not specify software versions for any libraries, dependencies, or programming languages.
Experiment Setup No The paper describes general experimental settings such as using a ResNet-101 backbone and fine-tuning procedures, and outlines a hyper-parameter tuning policy. However, it does not explicitly provide concrete hyperparameter values (e.g., learning rates, batch sizes, number of epochs) for the experiments conducted in the main text.