Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Exploiting the Asymmetric Uncertainty Structure of Pre-trained VLMs on the Unit Hypersphere
Authors: Li Ju, Max Andersson, Stina Fredriksson, Edward Glรถckner, Andreas Hellander, Ekta Vats, Prashant Singh
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the effectiveness of the probabilistic embeddings on established benchmarks, and present comprehensive ablation studies demonstrating the inherent nature of asymmetry in the uncertainty structure of textual and visual data. Empirically, Asym VLM yields more accurate uncertainty estimates and higher cross-modal retrieval accuracy on multiple benchmarks. We then present empirical evaluations, including uncertainty quantification, ablation studies, and downstream applications in Section 5. |
| Researcher Affiliation | Academia | Corresponding author (EMAIL) Department of Information Technology, Uppsala University, Uppsala, Sweden Science for Life Laboratory, Uppsala University, Uppsala, Sweden |
| Pseudocode | No | The paper includes mathematical formulations and derivations, but does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block, nor a structured, step-by-step procedure formatted like code. |
| Open Source Code | Yes | The code for this paper is available at https://github.com/li-ju666/asymvlm. The code for the paper is attached in the supplementary materials for the submission, and will be open-sourced through Git Hub. |
| Open Datasets | Yes | The MS-COCO [15] and Flickr-30k [17] datasets are used to train the adapters. Additionally, a subset of the Conceptual Caption dataset [23] of 200k samples (CC-200k) is also used. To understand the nature of learned uncertainty, the Hierar Cap dataset [2] is used... For example applications, CIFAR-10, CIFAR-100 [13] and STL-10 [9] are used for zero-shot classification. |
| Dataset Splits | Yes | For ablation studies, all methods are trained on the training set of CC-200k and evaluated on the validation set of CC-200k. For each task, we first group the recall results according to the uncertainty levels of the text embeddings. We evaluate these methods on a combined test set consisting of CIFAR-10 and CIFAR-100 samples, performing zero-shot classification over CIFAR-10 classes. |
| Hardware Specification | Yes | All computations are conducted on NVIDIA A100/A40 GPUs. On an NVIDIA A100, completing 200 epochs of adaptation on either MS-COCO, Flickr-30K or CC-200k requires under one GPU-hour. |
| Software Dependencies | No | All methods are implemented in Py Torch, and all pre-trained VLMs are loaded by transformers. No specific version numbers are provided for these software components. |
| Experiment Setup | Yes | The text adapter is implemented as a four-layer perceptron that takes the output of a pre-trained text encoder as its input. It consists of two hidden layers, each with 1024 dimensions and activated by the Re LU function... The temperature parameter in the Info NCE objective function is trainable. We use stochastic gradient descent with momentum (SGD-momen.) for the optimization of Asym VLM, PCME and PFE, and use Adam W for Prob VLM... The learning rates for different methods are optimized using a grid search within {10 4, 5 10 4, 10 3, 5 10 3, 10 2, 5 10 2}, and reported as Table A.1. We apply cosine annealing for learning rate scheduling with a minimal learning rate 10 6. (Table A.1 lists Optimizer, Learning rate, and Batch size for each method, e.g., Asym VLM SGD 10 2 2048). |