Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Hashing with Uncertainty Quantification via Sampling-based Hypothesis Testing

Authors: Yucheng Wang, Mingyuan Zhou, Xiaoning Qian

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our Hash UQ can achieve state-of-the-art retrieval performance on three image datasets. Ablation experiments on model hyperparameters, different model components, and effects of UQ are also provided with performance comparisons.
Researcher Affiliation Academia Yucheng Wang EMAIL Department of Electrical and Computer Engineering, Texas A&M University; Mingyuan Zhou EMAIL Mc Combs School of Business, The University of Texas at Austin Department of Statistics and Data Sciences, The University of Texas at Austin; Xiaoning Qian EMAIL Department of Electrical and Computer Engineering, Texas A&M University Department of Computer Science Engineering, Texas A&M University Computational Science Initiative, Brookhaven National Laboratory
Pseudocode No The paper describes the retrieval algorithm in Section 4.1 by detailing the steps and logic in paragraph format, but it does not present a clearly labeled pseudocode block or algorithm steps in a structured, code-like format.
Open Source Code Yes Our code is available at https://github.com/QianLab/HashUQ.
Open Datasets Yes We empirically evaluate image retrieval performances based on different supervised hashing methods to demonstrate our uncertainty aware Hash UQ s superiority on three benchmark image datasets: Image Net (Deng et al., 2009), MS COCO (Lin et al., 2014), and NUS WIDE (Chua et al., July 8-10, 2009).
Dataset Splits Yes One major difference between our evaluation pipeline and those adopted in previous works is that we explicitly include a validation set for model selection while most of the previous works does not differentiate the validation set with the test set (Cao et al., 2017; Su et al., 2018; Yuan et al., 2020; Fan et al., 2020; Tian Hoe et al., 2021). The dataset statistics with the adopted data splits are summarized in Table 5.
Hardware Specification No Portions of this research were conducted with the advanced computing resources provided by Texas A&M High Performance Research Computing. This statement is too general and lacks specific hardware details like CPU/GPU models or memory amounts.
Software Dependencies Yes We reproduce the results of Greedy Hash, CSQ, and HSWD, and implement Hash UQ performance evaluation and comparison based on the implementation from Deep Hash-pytorch1 with Py Torch 1.8.1 (Paszke et al., 2019).
Experiment Setup Yes The dropout rates are set to be 0.5. We optimize the neural networks using RMSprop (Hinton et al., 2012) optimizer with the learning rate 1e-5 and weight decay 1e-5 for all the models. We use our derived closed-form ELBO as the minimization objective for Center-Target construction... The coefficient balancing between data-fitting and prior belief λ is set to be 1.0 unless specified. ... We perform model evaluation using 100 samples from the learned dropout variational distribution. The best performing model on the validation sets during the first 100 training epochs are chosen to be evaluated on query and pool datasets.