Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Implicit Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis

Authors: Zhu Wang, Sourav Medya, Sathya Ravi

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that it is possible to design models that perform similarly to state-of-the-art results but with significantly fewer samples and less training time. Our models and code are available here: https://github.com/ellenzhuwang/implicit_vkood
Researcher Affiliation Academia Zhu Wang Sourav Medya Sathya N. Ravi Department of Computer Science, University of Illinois at Chicago EMAIL
Pseudocode Yes Algorithm 1 Fixed Point Network Operator based OOD Detection Layer for Language Features lj
Open Source Code Yes Our models and code are available here: https://github.com/ellenzhuwang/implicit_vkood
Open Datasets Yes We pre-trained on three datasets, including COCO [35], Visual Genome [28], and SBU Captions [47] with total of 1M images and 6.8M image-caption pairs, as approximate 30% less than the baseline(Vi LT).
Dataset Splits No The paper mentions datasets used for training, fine-tuning, and testing (e.g., VQAv2 test set, COCO val dataset), implying standard splits for these benchmarks. However, it does not explicitly provide specific percentages or counts for training/validation/test splits for full reproducibility, stating 'Following standard practice in Vision' for training strategies.
Hardware Specification Yes We pre-trained and fine-tuned both on 8 NVIDIA RTX 2080Ti GPUs, and for inference we used 1 NVIDIA RTX 2080Ti GPU.
Software Dependencies No The paper mentions various software components and models (e.g., RoBERTa, ViT-B/32, CLIP, BLIP, BERT-base, AdamW optimizer) but does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes Network training. We pre-trained the model for 10 epochs using Adam W optimizer [39] with learning rate of 1e 4 and weight decay of 1e 2. We chose the warm-up phase of learning rate to be 10% of the total training steps, and the learning rate was decayed linearly to 0 afterwards. Then, we fine-tuned our model for 5 epochs with learning rate of 2e 4 for all downstream tasks. In addition, we applied Rand Augment [12] as augmentation strategy in fine-tuning steps.