Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Deep Evidential Hashing for Trustworthy Cross-Modal Retrieval

Authors: Yuan Li, Liangli Zhen, Yuan Sun, Dezhong Peng, Xi Peng, Peng Hu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the efficacy of our DECH through extensive experimentation on four benchmark datasets. The experimental results demonstrate our superior performance compared to 12 state-of-the-art methods.
Researcher Affiliation	Collaboration	1College of Computer Science, Sichuan University 2Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), Singapore 3Sichuan National Innovation New Vision UHD Video Technology Co., Ltd., Chengdu 610095, China
Pseudocode	Yes	Algorithm 1: The Optimization Procedure of DECH
Open Source Code	Yes	Code https://github.com/blackant-dev/DECH
Open Datasets	Yes	We conduct our experiments on four benchmark datasets: MIRFLICKR25K(Huiskes and Lew 2008), IAPR TC12(Escalante et al. 2010), NUS-WIDE(Rasiwasia et al. 2010), and MS-COCO(Lin et al. 2014).
Dataset Splits	Yes	MIRFLICKR25K contains 20,500 image-text pairs from 24 classes, with 2,000 pairs reserved for querying, 10,000 for training, and the rest for retrieval. IAPR TC-12 comprises 20,000 pairs across 255 categories, with 2,000 pairs used for querying, 10,000 for training, and the remainder for retrieval. NUS-WIDE includes 195,834 pairs in 21 categories, with 2,000 pairs for querying, 10,500 for training, and the rest for retrieval. MS-COCO consists of 122,218 pairs in 80 classes, with 5,000 pairs for querying, 10,000 for training, and the remaining pairs forming the retrieval database.
Hardware Specification	Yes	Our method is implemented with Py Torch(Paszke et al. 2019) on a single NVIDIA GEFORCE RTX 3090 Ti GPU.
Software Dependencies	No	The paper mentions 'Py Torch(Paszke et al. 2019)' but does not specify a version number for PyTorch or any other software library.
Experiment Setup	Yes	For our DECH, we set τ to 0.2 and γ to 1. The parameter λ is empirically determined as per (Sensoy, Kaplan, and Kandemir 2018). Additionally, we also employ Lnzce RM as the near-zero correct-evidence loss during training. ... The optimal retrieval performance is achieved when τ is around 0.1.