Implicit Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis

Authors: Zhu Wang, Sourav Medya, Sathya Ravi

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that it is possible to design models that perform similarly to state-of-the-art results but with significantly fewer samples and less training time. Our models and code are available here: https://github.com/ellenzhuwang/implicit_vkood
Researcher Affiliation Academia Zhu Wang Sourav Medya Sathya N. Ravi Department of Computer Science, University of Illinois at Chicago {zwang260,medya,sathya}@uic.edu
Pseudocode Yes Algorithm 1 Fixed Point Network Operator based OOD Detection Layer for Language Features lj
Open Source Code Yes Our models and code are available here: https://github.com/ellenzhuwang/implicit_vkood
Open Datasets Yes We pre-trained on three datasets, including COCO [35], Visual Genome [28], and SBU Captions [47] with total of 1M images and 6.8M image-caption pairs, as approximate 30% less than the baseline(Vi LT).
Dataset Splits No The paper mentions datasets used for training, fine-tuning, and testing (e.g., VQAv2 test set, COCO val dataset), implying standard splits for these benchmarks. However, it does not explicitly provide specific percentages or counts for training/validation/test splits for full reproducibility, stating 'Following standard practice in Vision' for training strategies.
Hardware Specification Yes We pre-trained and fine-tuned both on 8 NVIDIA RTX 2080Ti GPUs, and for inference we used 1 NVIDIA RTX 2080Ti GPU.
Software Dependencies No The paper mentions various software components and models (e.g., RoBERTa, ViT-B/32, CLIP, BLIP, BERT-base, AdamW optimizer) but does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes Network training. We pre-trained the model for 10 epochs using Adam W optimizer [39] with learning rate of 1e 4 and weight decay of 1e 2. We chose the warm-up phase of learning rate to be 10% of the total training steps, and the learning rate was decayed linearly to 0 afterwards. Then, we fine-tuned our model for 5 epochs with learning rate of 2e 4 for all downstream tasks. In addition, we applied Rand Augment [12] as augmentation strategy in fine-tuning steps.