Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Hashing with Uncertainty Quantification via Sampling-based Hypothesis Testing

Authors: Yucheng Wang, Mingyuan Zhou, Xiaoning Qian

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that our Hash UQ can achieve state-of-the-art retrieval performance on three image datasets. Ablation experiments on model hyperparameters, different model components, and effects of UQ are also provided with performance comparisons.
Researcher Affiliation	Academia	Yucheng Wang EMAIL Department of Electrical and Computer Engineering, Texas A&M University; Mingyuan Zhou EMAIL Mc Combs School of Business, The University of Texas at Austin Department of Statistics and Data Sciences, The University of Texas at Austin; Xiaoning Qian EMAIL Department of Electrical and Computer Engineering, Texas A&M University Department of Computer Science Engineering, Texas A&M University Computational Science Initiative, Brookhaven National Laboratory
Pseudocode	No	The paper describes the retrieval algorithm in Section 4.1 by detailing the steps and logic in paragraph format, but it does not present a clearly labeled pseudocode block or algorithm steps in a structured, code-like format.
Open Source Code	Yes	Our code is available at https://github.com/QianLab/HashUQ.
Open Datasets	Yes	We empirically evaluate image retrieval performances based on different supervised hashing methods to demonstrate our uncertainty aware Hash UQ s superiority on three benchmark image datasets: Image Net (Deng et al., 2009), MS COCO (Lin et al., 2014), and NUS WIDE (Chua et al., July 8-10, 2009).
Dataset Splits	Yes	One major difference between our evaluation pipeline and those adopted in previous works is that we explicitly include a validation set for model selection while most of the previous works does not differentiate the validation set with the test set (Cao et al., 2017; Su et al., 2018; Yuan et al., 2020; Fan et al., 2020; Tian Hoe et al., 2021). The dataset statistics with the adopted data splits are summarized in Table 5.
Hardware Specification	No	Portions of this research were conducted with the advanced computing resources provided by Texas A&M High Performance Research Computing. This statement is too general and lacks specific hardware details like CPU/GPU models or memory amounts.
Software Dependencies	Yes	We reproduce the results of Greedy Hash, CSQ, and HSWD, and implement Hash UQ performance evaluation and comparison based on the implementation from Deep Hash-pytorch1 with Py Torch 1.8.1 (Paszke et al., 2019).
Experiment Setup	Yes	The dropout rates are set to be 0.5. We optimize the neural networks using RMSprop (Hinton et al., 2012) optimizer with the learning rate 1e-5 and weight decay 1e-5 for all the models. We use our derived closed-form ELBO as the minimization objective for Center-Target construction... The coefficient balancing between data-fitting and prior belief λ is set to be 1.0 unless specified. ... We perform model evaluation using 100 samples from the learned dropout variational distribution. The best performing model on the validation sets during the first 100 training epochs are chosen to be evaluated on query and pool datasets.