Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Generalized Debiased Semi-Supervised Hashing for Large-Scale Image Retrieval

Authors: Xingbo Liu, Xuening Zhang, Xiushan Nie, Yang Shi, Yilong Yin

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on three single-label and three multi-label image benchmarks demonstrate that GDSH remarkably outperforms the state-of-the-arts in different semi-supervised settings.
Researcher Affiliation	Collaboration	Xingbo Liu1, Xuening Zhang2*, Xiushan Nie1,3, Yang Shi4, Yilong Yin4 1School of Computer Science and Technology, Shandong Jianzhu University, Jinan 250101, China 2School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China 3Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Co., Ltd, Jinan, China 4School of Software, Shandong University, Jinan 250101, China
Pseudocode	No	The paper refers to 'Algorithm 1' in sections like 'Out-of-sample Extension' and 'Theoretical Analysis', but the pseudocode or algorithm block for 'Algorithm 1' is not provided within the main text of the paper.
Open Source Code	No	The paper does not contain an explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	To verify the superiority of the proposed method, we carried out experiments using six widely-used image benchmarks, including three single-label datasets, CALTECH-101(Fei Fei, Fergus, and Perona 2007), CIFAR-10(Krizhevsky and Hinton 2009), Image Net, and three multi-label datasets, MSCOCO(Lin et al. 2014), NUS-WIDE(Chua et al. 2009), MIRFlickr(Huiskes and Lew 2008).
Dataset Splits	No	The paper mentions using '30% supervision' and that 'the labeled subsets for all six benchmarks were kept the same throughout the experiments', but it does not specify exact training/validation/test splits (e.g., percentages or counts) for the overall datasets.
Hardware Specification	Yes	All the experiments were conducted on a computer with an Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz, 64GB RAM and a 64-bit Windows operating system.
Software Dependencies	No	The paper mentions 'existing tools in MATLAB' but does not provide specific version numbers for any software dependencies used in the implementation.
Experiment Setup	Yes	For comparison with baselines, we empirically set β = 0.3, µ = 105, θ = 105, ρ = 104, α = 10 6. The best choice of γ is equally set to 100 on CALTECH-101, CIFAR-10, MS-COCO, NUS-WIDE, MIRFlickr, and 104 on Image Net. k = 1 for single-label datasets, while k = 2 for multi-label datasets. The iteration numbers t and T are respectively set to 10 and 4. ... we finely tuned δ1 and δ2 via grid search, and use δ1 = 1 for all datasets. δ2 is set to 10 6 for single-label datasets, and 10 3 for multi-label datasets.